[
https://issues.apache.org/jira/browse/OAK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194421#comment-14194421
]
Michael Dürig commented on OAK-2140:
------------------------------------
+1. I think this is the most sensible thing to do for the time being as the
behaviour of compaction will also depend on the type of the repository content
and its usage pattern. Furthermore this option will also have an effect on dis
read/writes and disk space usage. Maybe once certain typical usage patter start
to emerge we can follow up on this issue and further refine it. For the time
being I'd go with the configuration option.
> Segment Compactor will not compact binaries > 16k
> -------------------------------------------------
>
> Key: OAK-2140
> URL: https://issues.apache.org/jira/browse/OAK-2140
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: core, segmentmk
> Reporter: Alex Parvulescu
> Assignee: Alex Parvulescu
> Fix For: 1.1.3
>
> Attachments: OAK-2140.patch
>
>
> The compaction bit rely on the SegmentBlob#clone method in the case a binary
> is being processed but it looks like the #clone contract is not fully
> enforced for streams that are qualified as 'long values' (>16k if I read the
> code correctly).
> What happens is the stream is initially persisted as chunks in a ListRecord.
> When compaction calls #clone it will get back the original list of record
> ids, which will get referenced from the compacted node state [0], making
> compaction on large binaries ineffective as the bulk segments will never move
> from the original location where they were created, unless the reference node
> gets deleted.
> I think the original design was setup to prevent large binaries from being
> copied over but looking at the size problem we have now it might be a good
> time to reconsider this approach.
> [0]
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentBlob.java#L75
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)