[
https://issues.apache.org/jira/browse/OAK-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Parvulescu updated OAK-4493:
---------------------------------
Fix Version/s: (was: 1.5.4)
1.5.5
> Offline compaction persisted mode
> ---------------------------------
>
> Key: OAK-4493
> URL: https://issues.apache.org/jira/browse/OAK-4493
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar, segmentmk
> Reporter: Alex Parvulescu
> Assignee: Alex Parvulescu
> Labels: candidate_oak_1_0, candidate_oak_1_2, candidate_oak_1_4,
> compaction, gc
> Fix For: 1.5.5
>
>
> I'm investigating a case where offline compaction is unable to finish, and
> crashes with OOMEs because of the content structure, namely large child node
> lists. The biggest issue is with the UUID index which has 55M nodes.
> In the current implementation, the compactor will use an inmemory nodestate
> to collect all the data, and persist at the very end, once compaction is done
> [0].
> This is prone to OOME once the size of the data parts (no binaries involved)
> grows beyond a certain size (in this case I have 350Gb but there's a fair
> amount of garbage due to compaction not running).
> My proposal is to add a special flag {{oak.compaction.eagerFlush=true}} that
> should be enabled only in case the size of the repo will not allow running
> offline compaction with the available heap size. This will turn the inmemory
> compaction transaction into one based on a persisted SegmentNodeState,
> meaning we're trading disk space (and IO) for memory.
> [0]
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/Compactor.java#L248
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)