[ 
https://issues.apache.org/jira/browse/OAK-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312132#comment-15312132
 ] 

Michael Dürig commented on OAK-4279:
------------------------------------

My main point is that the segment writer already de-duplicates segment blobs. 
So there is no need to add de-duplication logic on top of this again. 

Regarding IO, my point was comparing large blobs ({{new SegmentBlob(blobStore, 
duplicateId).equals(sb)}}). This is really expensive for large ones unless they 
are not equal. 


I think we should compare performance, efficiency and heap usage of OC with and 
without caching of binaries to get a clear picture.

> Rework offline compaction
> -------------------------
>
>                 Key: OAK-4279
>                 URL: https://issues.apache.org/jira/browse/OAK-4279
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Alex Parvulescu
>            Priority: Blocker
>              Labels: compaction, gc
>             Fix For: 1.6
>
>         Attachments: OAK-4279-checkpoints.patch, OAK-4279-v0.patch, 
> OAK-4279-v1.patch, OAK-4279-v2.patch, OAK-4279-v3.patch, OAK-4279-v4.patch
>
>
> The fix for OAK-3348 broke some of the previous functionality of offline 
> compaction:
> * No more progress logging
> * Compaction is not interruptible any more (in the sense of OAK-3290)
> * Offline compaction could remove the ids of the segment node states to 
> squeeze out some extra space. Those are only needed for later generations 
> generated via online compaction. 
> We should probably implement offline compaction again through a dedicated 
> {{Compactor}} class as it was done in {{oak-segment}} instead of relying on 
> the de-duplication cache (aka online compaction). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to