Thanks Mike, >>But, if you are doing deletions (or updateDocument, which is just a >>delete + add under-the-hood), then this will force the terms index of >>the segment readers to be loaded, thus consuming more RAM.
Out of 700,000 docs, by the time we get to doc 600,000, there is a good chance a few documents have been updated, which would cause a delete +add. >>One workaround for large terms index is to set the terms index divisor >>.that IndexWriter should use whenever it loads a terms index (this is >>IndexWriter.setReaderTermsIndexDivisor). I always get confused about the two different divisors and their names in the solrconfig.xml file We are setting termInfosIndexDivisor, which I think translates to the Lucene IndexWriter.setReaderTermsIndexDivisor <indexReaderFactory name="IndexReaderFactory" class="org.apache.solr.core.StandardIndexReaderFactory"> <int name="termInfosIndexDivisor">8</int> </indexReaderFactory > The other one is termIndexInterval which is set on the writer and determines what gets written to the tii file. I don't remember how to set this in Solr. Are we setting the right one to reduce RAM usage during merging? > So I think the gist is... the RAM usage will be in proportion to the > net size of the merge (mergeFactor + how big each merged segment is), > how many merges you allow concurrently, and whether you do false or > true deletions Does an optimize do something differently? Tom