Re: Memory use during merges (OOM)

Michael McCandless Thu, 16 Dec 2010 11:36:41 -0800

Actually terms index is something different.

If you don't use CFS, go and look at the size of *.tii in your index
directory -- those are the terms index.  The terms index picks a
subset of the terms (by default 128) to hold in RAM (plus some
metadata) in order to make seeking to a specific term faster.


Unfortunately they are held in a RAM intensive way, but in the
upcoming 4.0 release we've greatly reduced that.

Mike

On Thu, Dec 16, 2010 at 2:27 PM, Robert Petersen <rober...@buy.com> wrote:
> Thanks Mike!  When you say 'term index of the segment readers', are you 
> referring to the term vectors?
>
> In our case our index of 8 million docs holds pretty 'skinny' docs containing 
> searchable product titles and keywords, with the rest of the doc only holding 
> Ids for faceting upon.  Docs typically only have unique terms per doc, with a 
> lot of overlap of the terms across categories of docs (all similar products). 
>  I'm thinking that our unique terms are low vs the size of our index.  The 
> way we spin out deletes and adds should keep the terms loaded all the time.  
> Seems like once in a couple weeks a propagation happens which kills the slave 
> farm with OOMs.  We are bumping the heap up a couple gigs every time this 
> happens and hoping it goes away at this point.  That is why I jumped into 
> this discussion, sorry for butting in like that.  you guys are discussing 
> very interesting settings I had not considered before.
>
> Rob
>
>
> -----Original Message-----
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Thursday, December 16, 2010 10:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Memory use during merges (OOM)
>
> It's not that it's "bad", it's just that Lucene must do extra work to
> check if these deletes are real or not, and that extra work requires
> loading the terms index which will consume additional RAM.
>
> For most apps, though, the terms index is relatively small and so this
> isn't really an issue.  But if your terms index is large this can
> explain the added RAM usage.
>
> One workaround for large terms index is to set the terms index divisor
> that IndexWriter should use whenever it loads a terms index (this is
> IndexWriter.setReaderTermsIndexDivisor).
>
> Mike
>
> On Thu, Dec 16, 2010 at 12:17 PM, Robert Petersen <rober...@buy.com> wrote:
>> Hello we occasionally bump into the OOM issue during merging after 
>> propagation too, and from the discussion below I guess we are doing 
>> thousands of 'false deletions' by unique id to make sure certain documents 
>> are *not* in the index.  Could anyone explain why that is bad?  I didn't 
>> really understand the conclusion below.
>>
>> -----Original Message-----
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Thursday, December 16, 2010 2:51 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Memory use during merges (OOM)
>>
>> RAM usage for merging is tricky.
>>
>> First off, merging must hold open a SegmentReader for each segment
>> being merged.  However, it's not necessarily a full segment reader;
>> for example, merging doesn't need the terms index nor norms.  But it
>> will load deleted docs.
>>
>> But, if you are doing deletions (or updateDocument, which is just a
>> delete + add under-the-hood), then this will force the terms index of
>> the segment readers to be loaded, thus consuming more RAM.
>> Furthermore, if the deletions you (by Term/Query) do in fact result in
>> deleted documents (ie they were not "false" deletions), then the
>> merging allocates an int[maxDoc()] for each SegmentReader that has
>> deletions.
>>
>> Finally, if you have multiple merges running at once (see
>> CSM.setMaxMergeCount) that means RAM for each currently running merge
>> is tied up.
>>
>> So I think the gist is... the RAM usage will be in proportion to the
>> net size of the merge (mergeFactor + how big each merged segment is),
>> how many merges you allow concurrently, and whether you do false or
>> true deletions.
>>
>> If you are doing false deletions (calling .updateDocument when in fact
>> the Term you are replacing cannot exist) it'd be best if possible to
>> change the app to not call .updateDocument if you know the Term
>> doesn't exist.
>>
>> Mike
>>
>> On Wed, Dec 15, 2010 at 6:52 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
>>> Hello all,
>>>
>>> Are there any general guidelines for determining the main factors in memory 
>>> use during merges?
>>>
>>> We recently changed our indexing configuration to speed up indexing but in 
>>> the process of doing a very large merge we are running out of memory.
>>> Below is a list of the changes and part of the indexwriter log.  The 
>>> changes increased the indexing though-put by almost an order of magnitude.
>>> (about 600 documents per hour to about 6000 documents per hour.  Our 
>>> documents are about 800K)
>>>
>>> We are trying to determine which of the changes to tweak to avoid the OOM, 
>>> but still keep the benefit of the increased indexing throughput
>>>
>>> Is it likely that the changes to ramBufferSizeMB are the culprit or could 
>>> it be the mergeFactor change from 10-20?
>>>
>>>  Is there any obvious relationship between ramBufferSizeMB and the memory 
>>> consumed by Solr?
>>>  Are there rules of thumb for the memory needed in terms of the number or 
>>> size of segments?
>>>
>>> Our largest segments prior to the failed merge attempt were between 5GB and 
>>> 30GB.  The memory allocated to the Solr/tomcat JVM is 10GB.
>>>
>>> Tom Burton-West
>>> -----------------------------------------------------------------
>>>
>>> Changes to indexing configuration:
>>> mergeScheduler
>>>        before: serialMergeScheduler
>>>        after:    concurrentMergeScheduler
>>> mergeFactor
>>>        before: 10
>>>            after : 20
>>> ramBufferSizeMB
>>>        before: 32
>>>              after: 320
>>>
>>> excerpt from indexWriter.log
>>>
>>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>>> http-8091-Processor70]: LMP: findMerges: 40 segments
>>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>>> http-8091-Processor70]: LMP:   level 7.23609 to 7.98609: 20 segments
>>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>>> http-8091-Processor70]: LMP:     0 to 20: add this merge
>>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>>> http-8091-Processor70]: LMP:   level 5.44878 to 6.19878: 20 segments
>>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>>> http-8091-Processor70]: LMP:     20 to 40: add this merge
>>>
>>> ...
>>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>>> http-8091-Processor70]: applyDeletes
>>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; 
>>> http-8091-Processor70]: DW: apply 1320 buffered deleted terms and 0 deleted 
>>> docIDs and 0 deleted queries on 40 segments.
>>> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; 
>>> http-8091-Processor70]: hit exception flushing deletes
>>> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; 
>>> http-8091-Processor70]: hit OutOfMemoryError inside updateDocument
>>> tom
>>>
>>>
>>
>

Re: Memory use during merges (OOM)

Reply via email to