Alex Parvulescu created OAK-4493:
------------------------------------
Summary: Offline compaction persisted mode
Key: OAK-4493
URL: https://issues.apache.org/jira/browse/OAK-4493
Project: Jackrabbit Oak
Issue Type: Bug
Components: segment-tar, segmentmk
Reporter: Alex Parvulescu
Assignee: Alex Parvulescu
I'm investigating a case where offline compaction is unable to finish, and
crashes with OOMEs because of the content structure, namely large child node
lists. The biggest issue is with the UUID index which has 55M nodes.
In the current implementation, the compactor will use an inmemory nodestate to
collect all the data, and persist at the very end, once compaction is done [0].
This is prone to OOME once the size of the data parts (no binaries involved)
grows beyond a certain size (in this case I have 350Gb but there's a fair
amount of garbage due to compaction not running).
My proposal is to add a special flag {{oak.compaction.eagerFlush=true}} that
should be enabled only in case the size of the repo will not allow running
offline compaction with the available heap size. This will turn the inmemory
compaction transaction into one based on a persisted SegmentNodeState, meaning
we're trading disk space (and IO) for memory.
[0]
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/Compactor.java#L248
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)