[ 
https://issues.apache.org/jira/browse/OAK-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312446#comment-15312446
 ] 

Alex Parvulescu commented on OAK-4279:
--------------------------------------

bq. offlineCompactionBin2: my guestimate would be that the extra cache in the 
compactor doesn't add too much benefit but consumes extra memory.
removing the cache on large binaries I see quite a few IO calls to the binary, 
but more importantly, I see the entire list or recordids being persisted again
{code}
return writeValueRecord(segmentStream.getLength(), writeList(blockIds));
{code}
so while true that there is some recordid extraction happening from the 
segmentstream, the store will persist again the entire list and produce another 
recordid for the binary. so depending on the blob size that's a heavy chunk of 
IO.
also for the fun of it, store size with binary record cache enabled is {{5 278 
208}} and without {{5 282 304}}, this illustrates the above issue with 
persisting the recordid list a second time.

> Rework offline compaction
> -------------------------
>
>                 Key: OAK-4279
>                 URL: https://issues.apache.org/jira/browse/OAK-4279
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Alex Parvulescu
>            Priority: Blocker
>              Labels: compaction, gc
>             Fix For: 1.6
>
>         Attachments: OAK-4279-binaries.patch, OAK-4279-checkpoints.patch, 
> OAK-4279-v0.patch, OAK-4279-v1.patch, OAK-4279-v2.patch, OAK-4279-v3.patch, 
> OAK-4279-v4.patch
>
>
> The fix for OAK-3348 broke some of the previous functionality of offline 
> compaction:
> * No more progress logging
> * Compaction is not interruptible any more (in the sense of OAK-3290)
> * Offline compaction could remove the ids of the segment node states to 
> squeeze out some extra space. Those are only needed for later generations 
> generated via online compaction. 
> We should probably implement offline compaction again through a dedicated 
> {{Compactor}} class as it was done in {{oak-segment}} instead of relying on 
> the de-duplication cache (aka online compaction). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to