Hi, I have opened OAK-2989 to track improving support for arbitrary large commits in Oak core.
Regards, Timothee 2015-06-09 10:22 GMT+01:00 Timothée Maret <timothee.ma...@gmail.com>: > Hi Michael, > > > 2015-06-09 9:49 GMT+01:00 Michael Dürig <mdue...@apache.org>: > >> >> Changing the hash map fill ratio sounds like a work around to me. It will >> only push the problem a little further out. >> > > I agree that this is pushing the limit a little further. > However, I believe it anyway makes sense to enable this improvement as it > is really cheap to implement and still reduce the memory footprint. > I have opened OAK-2968 and OAK-2969. > > >> >> Oak should be able to handle arbitrary large commits. So we should find >> and fix the root cause for this. >> >> With which backend does this occur? > > > Mongo / DocumentStore > > >> At which phase of the commit? > > > AFAIK, while building the commit. How can I make sure ? > > >> Did you collect thread dumps from the time when these hash maps start >> piling up? >> > > No. Though I have a heap dump taken at the time a OOME occurred on an > instance (JVM ran with -XX:+HeapDumpOnOutOfMemoryError). > I can not share that heap dump because it contains customer data and it is > fairly big (32 GB uncompressed). > > However, I have attached a high level view of the dominator tree which > identifies the main heap users (ModifiedNodeState & UpdateOp). > Both classes contain HashMap/HashSet fields which size is initialised with > default size. > > This instance seems to have a rather deep tree but not wide. > > >> >> ModifiedNodeState instances are used to collect transient changes in >> memory up to a certain point. Afterwards changes should be written ahead to >> the backend. > > > Ok, that would make sense and allow to indeed handle arbitrarily large > commits. > With the current state, there seems to be no other way than buying a heap > of RAM or looking at the OS to swap memory to disk for ages. > > Regards, > > Timothee > > >> >> >> Michael >> >> >> On 8.6.15 7:35 , Timothée Maret wrote: >> >>> Hi, >>> >>> At Adobe, we use large commits to update our content repository >>> atomically. >>> Those large commits require a large amount of heap memory or the JVM >>> throws >>> OOMEs and the commit fails. >>> >>> In one setup, we are configuring the JVM with a max heap size of 32GB, >>> yet >>> we still hit OOMEs. >>> I looked at the heap dump taken at the occurrence of the OOME and running >>> it through Eclipse Memory Analyser I noticed that >>> >>> 1. HashMap objects are consuming the most heap size (~10GB) ; and >>> 2. 54% of the HashMap instances do contain less than 12 elements ; and >>> 3. ~40% of the HashMap instances do contain 1 element ; and >>> 4. The ModifiedNodeState instances contains ~10GB of HashMap >>> >>> Since HashMaps consist in the vast majority of the memory consumed, >>> memory >>> consumption could be diminished by using HashMaps with a higher fill >>> ratio. >>> Looking at the code in [0], it seems HashMaps are sometimes created with >>> default capacity. >>> Specifying the initial capacity for every new HashMap instance in [0] as >>> either (the capacity required) or 1 (if no better guess) would improve >>> the >>> HashMap fill ratio and thus decrease the commits memory footprint. >>> >>> wdyt ? >>> >>> Regards, >>> >>> Timothee >>> >>> [0] >>> >>> org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState#ModifiedNodeState >>> >>> >