[ 
https://issues.apache.org/jira/browse/OAK-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343976#comment-15343976
 ] 

Alex Parvulescu commented on OAK-4493:
--------------------------------------

Pasting in a cleaned sample of a failed running compaction. this is a single 
node state that has 8M nodes only (I think compaction was actually processing 
the 55M child nodes content node):
{noformat}
java.util.HashMap$Entry[8388608] 1.9Gb
    table java.util.HashMap 1.9Gb
        nodes org.apache.jackrabbit.oak.plugins.memory.MutableNodeState 1.9Gb
            state 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder$RootHead 1.9Gb
                head, rootHead 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder 1.9Gb
{noformat}
I have also seen this failing at 33M nodes( with a bigger max heap value).


> Offline compaction persisted mode
> ---------------------------------
>
>                 Key: OAK-4493
>                 URL: https://issues.apache.org/jira/browse/OAK-4493
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, segmentmk
>            Reporter: Alex Parvulescu
>            Assignee: Alex Parvulescu
>              Labels: compaction, gc
>
> I'm investigating a case where offline compaction is unable to finish, and 
> crashes with OOMEs because of the content structure, namely large child node 
> lists. The biggest issue is with the UUID index which has 55M nodes.
> In the current implementation, the compactor will use an inmemory nodestate 
> to collect all the data, and persist at the very end, once compaction is done 
> [0]. 
> This is prone to OOME once the size of the data parts (no binaries involved) 
> grows beyond a certain size (in this case I have 350Gb but there's a fair 
> amount of garbage due to compaction not running).
> My proposal is to add a special flag {{oak.compaction.eagerFlush=true}} that 
> should be enabled only in case the size of the repo will not allow running 
> offline compaction with the available heap size. This will turn the inmemory 
> compaction transaction into one based on a persisted SegmentNodeState, 
> meaning we're trading disk space (and IO) for memory.
> [0] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/Compactor.java#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to