[ 
https://issues.apache.org/jira/browse/OAK-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Parvulescu updated OAK-4493:
---------------------------------
    Labels: compaction gc  (was: )

> Offline compaction persisted mode
> ---------------------------------
>
>                 Key: OAK-4493
>                 URL: https://issues.apache.org/jira/browse/OAK-4493
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, segmentmk
>            Reporter: Alex Parvulescu
>            Assignee: Alex Parvulescu
>              Labels: compaction, gc
>
> I'm investigating a case where offline compaction is unable to finish, and 
> crashes with OOMEs because of the content structure, namely large child node 
> lists. The biggest issue is with the UUID index which has 55M nodes.
> In the current implementation, the compactor will use an inmemory nodestate 
> to collect all the data, and persist at the very end, once compaction is done 
> [0]. 
> This is prone to OOME once the size of the data parts (no binaries involved) 
> grows beyond a certain size (in this case I have 350Gb but there's a fair 
> amount of garbage due to compaction not running).
> My proposal is to add a special flag {{oak.compaction.eagerFlush=true}} that 
> should be enabled only in case the size of the repo will not allow running 
> offline compaction with the available heap size. This will turn the inmemory 
> compaction transaction into one based on a persisted SegmentNodeState, 
> meaning we're trading disk space (and IO) for memory.
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/Compactor.java#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to