[ 
https://issues.apache.org/jira/browse/OAK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171057#comment-14171057
 ] 

Michael Dürig commented on OAK-2140:
------------------------------------

I did some test with a quick fix to the {{#clone}} method such that it actually 
does a deep copy of all binaries. The behaviour seemed right as I could see the 
binary segments being copied instead of just new generations for them being 
written. However in the face of OAK-2045 I also observed much bigger repository 
growth as now compaction also creates a copy of all binaries while the 
originals are held from being removed from long running sessions. 

> Segment Compactor will not compact binaries > 16k
> -------------------------------------------------
>
>                 Key: OAK-2140
>                 URL: https://issues.apache.org/jira/browse/OAK-2140
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: core, segmentmk
>            Reporter: Alex Parvulescu
>
> The compaction bit rely on the SegmentBlob#clone method in the case a binary 
> is being processed but it looks like the #clone contract is not fully 
> enforced for streams that are qualified as 'long values' (>16k if I read the 
> code correctly). 
> What happens is the stream is initially persisted as chunks in a ListRecord. 
> When compaction calls #clone it will get back the original list of record 
> ids, which will get referenced from the compacted node state [0], making 
> compaction on large binaries ineffective as the bulk segments will never move 
> from the original location where they were created, unless the reference node 
> gets deleted.
> I think the original design was setup to prevent large binaries from being 
> copied over but looking at the size problem we have now it might be a good 
> time to reconsider this approach.
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentBlob.java#L75



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to