Actually terms index is something different. If you don't use CFS, go and look at the size of *.tii in your index directory -- those are the terms index. The terms index picks a subset of the terms (by default 128) to hold in RAM (plus some metadata) in order to make seeking to a specific term faster.
Unfortunately they are held in a RAM intensive way, but in the upcoming 4.0 release we've greatly reduced that. Mike On Thu, Dec 16, 2010 at 2:27 PM, Robert Petersen <rober...@buy.com> wrote: > Thanks Mike! When you say 'term index of the segment readers', are you > referring to the term vectors? > > In our case our index of 8 million docs holds pretty 'skinny' docs containing > searchable product titles and keywords, with the rest of the doc only holding > Ids for faceting upon. Docs typically only have unique terms per doc, with a > lot of overlap of the terms across categories of docs (all similar products). > I'm thinking that our unique terms are low vs the size of our index. The > way we spin out deletes and adds should keep the terms loaded all the time. > Seems like once in a couple weeks a propagation happens which kills the slave > farm with OOMs. We are bumping the heap up a couple gigs every time this > happens and hoping it goes away at this point. That is why I jumped into > this discussion, sorry for butting in like that. you guys are discussing > very interesting settings I had not considered before. > > Rob > > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Thursday, December 16, 2010 10:24 AM > To: solr-user@lucene.apache.org > Subject: Re: Memory use during merges (OOM) > > It's not that it's "bad", it's just that Lucene must do extra work to > check if these deletes are real or not, and that extra work requires > loading the terms index which will consume additional RAM. > > For most apps, though, the terms index is relatively small and so this > isn't really an issue. But if your terms index is large this can > explain the added RAM usage. > > One workaround for large terms index is to set the terms index divisor > that IndexWriter should use whenever it loads a terms index (this is > IndexWriter.setReaderTermsIndexDivisor). > > Mike > > On Thu, Dec 16, 2010 at 12:17 PM, Robert Petersen <rober...@buy.com> wrote: >> Hello we occasionally bump into the OOM issue during merging after >> propagation too, and from the discussion below I guess we are doing >> thousands of 'false deletions' by unique id to make sure certain documents >> are *not* in the index. Could anyone explain why that is bad? I didn't >> really understand the conclusion below. >> >> -----Original Message----- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Thursday, December 16, 2010 2:51 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Memory use during merges (OOM) >> >> RAM usage for merging is tricky. >> >> First off, merging must hold open a SegmentReader for each segment >> being merged. However, it's not necessarily a full segment reader; >> for example, merging doesn't need the terms index nor norms. But it >> will load deleted docs. >> >> But, if you are doing deletions (or updateDocument, which is just a >> delete + add under-the-hood), then this will force the terms index of >> the segment readers to be loaded, thus consuming more RAM. >> Furthermore, if the deletions you (by Term/Query) do in fact result in >> deleted documents (ie they were not "false" deletions), then the >> merging allocates an int[maxDoc()] for each SegmentReader that has >> deletions. >> >> Finally, if you have multiple merges running at once (see >> CSM.setMaxMergeCount) that means RAM for each currently running merge >> is tied up. >> >> So I think the gist is... the RAM usage will be in proportion to the >> net size of the merge (mergeFactor + how big each merged segment is), >> how many merges you allow concurrently, and whether you do false or >> true deletions. >> >> If you are doing false deletions (calling .updateDocument when in fact >> the Term you are replacing cannot exist) it'd be best if possible to >> change the app to not call .updateDocument if you know the Term >> doesn't exist. >> >> Mike >> >> On Wed, Dec 15, 2010 at 6:52 PM, Burton-West, Tom <tburt...@umich.edu> wrote: >>> Hello all, >>> >>> Are there any general guidelines for determining the main factors in memory >>> use during merges? >>> >>> We recently changed our indexing configuration to speed up indexing but in >>> the process of doing a very large merge we are running out of memory. >>> Below is a list of the changes and part of the indexwriter log. The >>> changes increased the indexing though-put by almost an order of magnitude. >>> (about 600 documents per hour to about 6000 documents per hour. Our >>> documents are about 800K) >>> >>> We are trying to determine which of the changes to tweak to avoid the OOM, >>> but still keep the benefit of the increased indexing throughput >>> >>> Is it likely that the changes to ramBufferSizeMB are the culprit or could >>> it be the mergeFactor change from 10-20? >>> >>> Is there any obvious relationship between ramBufferSizeMB and the memory >>> consumed by Solr? >>> Are there rules of thumb for the memory needed in terms of the number or >>> size of segments? >>> >>> Our largest segments prior to the failed merge attempt were between 5GB and >>> 30GB. The memory allocated to the Solr/tomcat JVM is 10GB. >>> >>> Tom Burton-West >>> ----------------------------------------------------------------- >>> >>> Changes to indexing configuration: >>> mergeScheduler >>> before: serialMergeScheduler >>> after: concurrentMergeScheduler >>> mergeFactor >>> before: 10 >>> after : 20 >>> ramBufferSizeMB >>> before: 32 >>> after: 320 >>> >>> excerpt from indexWriter.log >>> >>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; >>> http-8091-Processor70]: LMP: findMerges: 40 segments >>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; >>> http-8091-Processor70]: LMP: level 7.23609 to 7.98609: 20 segments >>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; >>> http-8091-Processor70]: LMP: 0 to 20: add this merge >>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; >>> http-8091-Processor70]: LMP: level 5.44878 to 6.19878: 20 segments >>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; >>> http-8091-Processor70]: LMP: 20 to 40: add this merge >>> >>> ... >>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; >>> http-8091-Processor70]: applyDeletes >>> Dec 14, 2010 5:34:10 PM IW 0 [Tue Dec 14 17:34:10 EST 2010; >>> http-8091-Processor70]: DW: apply 1320 buffered deleted terms and 0 deleted >>> docIDs and 0 deleted queries on 40 segments. >>> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; >>> http-8091-Processor70]: hit exception flushing deletes >>> Dec 14, 2010 5:48:17 PM IW 0 [Tue Dec 14 17:48:17 EST 2010; >>> http-8091-Processor70]: hit OutOfMemoryError inside updateDocument >>> tom >>> >>> >> >