Hello we occasionally bump into the OOM issue during merging after propagation 
too, and from the discussion below I guess we are doing thousands of 'false 
deletions' by unique id to make sure certain documents are *not* in the index.  
Could anyone explain why that is bad?  I didn't really understand the 
conclusion below. 

-----Original Message-----
From: Michael McCandless [mailto:luc...@mikemccandless.com] 
Sent: Thursday, December 16, 2010 2:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Memory use during merges (OOM)

RAM usage for merging is tricky.

First off, merging must hold open a SegmentReader for each segment
being merged.  However, it's not necessarily a full segment reader;
for example, merging doesn't need the terms index nor norms.  But it
will load deleted docs.

But, if you are doing deletions (or updateDocument, which is just a
delete + add under-the-hood), then this will force the terms index of
the segment readers to be loaded, thus consuming more RAM.
Furthermore, if the deletions you (by Term/Query) do in fact result in
deleted documents (ie they were not "false" deletions), then the
merging allocates an int[maxDoc()] for each SegmentReader that has
deletions.

Finally, if you have multiple merges running at once (see
CSM.setMaxMergeCount) that means RAM for each currently running merge
is tied up.

So I think the gist is... the RAM usage will be in proportion to the
net size of the merge (mergeFactor + how big each merged segment is),
how many merges you allow concurrently, and whether you do false or
true deletions.

If you are doing false deletions (calling .updateDocument when in fact
the Term you are replacing cannot exist) it'd be best if possible to
change the app to not call .updateDocument if you know the Term
doesn't exist.

Mike

On Wed, Dec 15, 2010 at 6:52 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
> Hello all,
>
> Are there any general guidelines for determining the main factors in memory 
> use during merges?
>
> We recently changed our indexing configuration to speed up indexing but in 
> the process of doing a very large merge we are running out of memory.
> Below is a list of the changes and part of the indexwriter log.  The changes 
> increased the indexing though-put by almost an order of magnitude.
> (about 600 documents per hour to about 6000 documents per hour.  Our 
> documents are about 800K)
>
> We are trying to determine which of the changes to tweak to avoid the OOM, 
> but still keep the benefit of the increased indexing throughput
>
> Is it likely that the changes to ramBufferSizeMB are the culprit or could it 
> be the mergeFactor change from 10-20?
>
>  Is there any obvious relationship between ramBufferSizeMB and the memory 
> consumed by Solr?
>  Are there rules of thumb for the memory needed in terms of the number or 
> size of segments?
>
> Our largest segments prior to the failed merge attempt were between 5GB and 
> 30GB.  The memory allocated to the Solr/tomcat JVM is 10GB.
>
> Tom Burton-West
> -----------------------------------------------------------------
>
> Changes to indexing configuration:
> mergeScheduler
>        before: serialMergeScheduler
>        after:    concurrentMergeScheduler
> mergeFactor
>        before: 10
>            after : 20
> ramBufferSizeMB
>        before: 32
>              after: 320
>
> excerpt from indexWriter.log
>
> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
> http-8091-Processor70]: LMP: findMerges: 40 segments
> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
> http-8091-Processor70]: LMP:   level 7.23609 to 7.98609: 20 segments
> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
> http-8091-Processor70]: LMP:     0 to 20: add this merge
> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
> http-8091-Processor70]: LMP:   level 5.44878 to 6.19878: 20 segments
> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
> http-8091-Processor70]: LMP:     20 to 40: add this merge
>
> ...
> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
> http-8091-Processor70]: applyDeletes
> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
> http-8091-Processor70]: DW: apply 1320 buffered deleted terms and 0 deleted 
> docIDs and 0 deleted queries on 40 segments.
> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; 
> http-8091-Processor70]: hit exception flushing deletes
> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; 
> http-8091-Processor70]: hit OutOfMemoryError inside updateDocument
> tom
>
>

Reply via email to