RAM usage for merging is tricky. First off, merging must hold open a SegmentReader for each segment being merged. However, it's not necessarily a full segment reader; for example, merging doesn't need the terms index nor norms. But it will load deleted docs.
But, if you are doing deletions (or updateDocument, which is just a delete + add under-the-hood), then this will force the terms index of the segment readers to be loaded, thus consuming more RAM. Furthermore, if the deletions you (by Term/Query) do in fact result in deleted documents (ie they were not "false" deletions), then the merging allocates an int[maxDoc()] for each SegmentReader that has deletions. Finally, if you have multiple merges running at once (see CSM.setMaxMergeCount) that means RAM for each currently running merge is tied up. So I think the gist is... the RAM usage will be in proportion to the net size of the merge (mergeFactor + how big each merged segment is), how many merges you allow concurrently, and whether you do false or true deletions. If you are doing false deletions (calling .updateDocument when in fact the Term you are replacing cannot exist) it'd be best if possible to change the app to not call .updateDocument if you know the Term doesn't exist. Mike On Wed, Dec 15, 2010 at 6:52 PM, Burton-West, Tom <tburt...@umich.edu> wrote: > Hello all, > > Are there any general guidelines for determining the main factors in memory > use during merges? > > We recently changed our indexing configuration to speed up indexing but in > the process of doing a very large merge we are running out of memory. > Below is a list of the changes and part of the indexwriter log. The changes > increased the indexing though-put by almost an order of magnitude. > (about 600 documents per hour to about 6000 documents per hour. Our > documents are about 800K) > > We are trying to determine which of the changes to tweak to avoid the OOM, > but still keep the benefit of the increased indexing throughput > > Is it likely that the changes to ramBufferSizeMB are the culprit or could it > be the mergeFactor change from 10-20? > > Is there any obvious relationship between ramBufferSizeMB and the memory > consumed by Solr? > Are there rules of thumb for the memory needed in terms of the number or > size of segments? > > Our largest segments prior to the failed merge attempt were between 5GB and > 30GB. The memory allocated to the Solr/tomcat JVM is 10GB. > > Tom Burton-West > ----------------------------------------------------------------- > > Changes to indexing configuration: > mergeScheduler > before: serialMergeScheduler > after: concurrentMergeScheduler > mergeFactor > before: 10 > after : 20 > ramBufferSizeMB > before: 32 > after: 320 > > excerpt from indexWriter.log > > Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; > http-8091-Processor70]: LMP: findMerges: 40 segments > Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; > http-8091-Processor70]: LMP: level 7.23609 to 7.98609: 20 segments > Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; > http-8091-Processor70]: LMP: 0 to 20: add this merge > Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; > http-8091-Processor70]: LMP: level 5.44878 to 6.19878: 20 segments > Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; > http-8091-Processor70]: LMP: 20 to 40: add this merge > > ... > Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; > http-8091-Processor70]: applyDeletes > Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; > http-8091-Processor70]: DW: apply 1320 buffered deleted terms and 0 deleted > docIDs and 0 deleted queries on 40 segments. > Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; > http-8091-Processor70]: hit exception flushing deletes > Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; > http-8091-Processor70]: hit OutOfMemoryError inside updateDocument > tom > >