It's not that it's "bad", it's just that Lucene must do extra work to
check if these deletes are real or not, and that extra work requires
loading the terms index which will consume additional RAM.

For most apps, though, the terms index is relatively small and so this
isn't really an issue.  But if your terms index is large this can
explain the added RAM usage.

One workaround for large terms index is to set the terms index divisor
that IndexWriter should use whenever it loads a terms index (this is
IndexWriter.setReaderTermsIndexDivisor).

Mike

On Thu, Dec 16, 2010 at 12:17 PM, Robert Petersen <rober...@buy.com> wrote:
> Hello we occasionally bump into the OOM issue during merging after 
> propagation too, and from the discussion below I guess we are doing thousands 
> of 'false deletions' by unique id to make sure certain documents are *not* in 
> the index.  Could anyone explain why that is bad?  I didn't really understand 
> the conclusion below.
>
> -----Original Message-----
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Thursday, December 16, 2010 2:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Memory use during merges (OOM)
>
> RAM usage for merging is tricky.
>
> First off, merging must hold open a SegmentReader for each segment
> being merged.  However, it's not necessarily a full segment reader;
> for example, merging doesn't need the terms index nor norms.  But it
> will load deleted docs.
>
> But, if you are doing deletions (or updateDocument, which is just a
> delete + add under-the-hood), then this will force the terms index of
> the segment readers to be loaded, thus consuming more RAM.
> Furthermore, if the deletions you (by Term/Query) do in fact result in
> deleted documents (ie they were not "false" deletions), then the
> merging allocates an int[maxDoc()] for each SegmentReader that has
> deletions.
>
> Finally, if you have multiple merges running at once (see
> CSM.setMaxMergeCount) that means RAM for each currently running merge
> is tied up.
>
> So I think the gist is... the RAM usage will be in proportion to the
> net size of the merge (mergeFactor + how big each merged segment is),
> how many merges you allow concurrently, and whether you do false or
> true deletions.
>
> If you are doing false deletions (calling .updateDocument when in fact
> the Term you are replacing cannot exist) it'd be best if possible to
> change the app to not call .updateDocument if you know the Term
> doesn't exist.
>
> Mike
>
> On Wed, Dec 15, 2010 at 6:52 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
>> Hello all,
>>
>> Are there any general guidelines for determining the main factors in memory 
>> use during merges?
>>
>> We recently changed our indexing configuration to speed up indexing but in 
>> the process of doing a very large merge we are running out of memory.
>> Below is a list of the changes and part of the indexwriter log.  The changes 
>> increased the indexing though-put by almost an order of magnitude.
>> (about 600 documents per hour to about 6000 documents per hour.  Our 
>> documents are about 800K)
>>
>> We are trying to determine which of the changes to tweak to avoid the OOM, 
>> but still keep the benefit of the increased indexing throughput
>>
>> Is it likely that the changes to ramBufferSizeMB are the culprit or could it 
>> be the mergeFactor change from 10-20?
>>
>>  Is there any obvious relationship between ramBufferSizeMB and the memory 
>> consumed by Solr?
>>  Are there rules of thumb for the memory needed in terms of the number or 
>> size of segments?
>>
>> Our largest segments prior to the failed merge attempt were between 5GB and 
>> 30GB.  The memory allocated to the Solr/tomcat JVM is 10GB.
>>
>> Tom Burton-West
>> -----------------------------------------------------------------
>>
>> Changes to indexing configuration:
>> mergeScheduler
>>        before: serialMergeScheduler
>>        after:    concurrentMergeScheduler
>> mergeFactor
>>        before: 10
>>            after : 20
>> ramBufferSizeMB
>>        before: 32
>>              after: 320
>>
>> excerpt from indexWriter.log
>>
>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>> http-8091-Processor70]: LMP: findMerges: 40 segments
>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>> http-8091-Processor70]: LMP:   level 7.23609 to 7.98609: 20 segments
>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>> http-8091-Processor70]: LMP:     0 to 20: add this merge
>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>> http-8091-Processor70]: LMP:   level 5.44878 to 6.19878: 20 segments
>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>> http-8091-Processor70]: LMP:     20 to 40: add this merge
>>
>> ...
>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>> http-8091-Processor70]: applyDeletes
>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>> http-8091-Processor70]: DW: apply 1320 buffered deleted terms and 0 deleted 
>> docIDs and 0 deleted queries on 40 segments.
>> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; 
>> http-8091-Processor70]: hit exception flushing deletes
>> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; 
>> http-8091-Processor70]: hit OutOfMemoryError inside updateDocument
>> tom
>>
>>
>

Reply via email to