Thanks Mike,

>>But, if you are doing deletions (or updateDocument, which is just a
>>delete + add under-the-hood), then this will force the terms index of
>>the segment readers to be loaded, thus consuming more RAM.

Out of 700,000 docs, by the time we get to doc 600,000, there is a good chance 
a few documents have been updated, which would cause a delete +add.  


>>One workaround for large terms index is to set the terms index divisor
>>.that IndexWriter should use whenever it loads a terms index (this is
>>IndexWriter.setReaderTermsIndexDivisor).

I always get confused about the two different divisors and their names in the 
solrconfig.xml file

We are setting  termInfosIndexDivisor, which I think translates to the Lucene 
IndexWriter.setReaderTermsIndexDivisor

<indexReaderFactory name="IndexReaderFactory" 
class="org.apache.solr.core.StandardIndexReaderFactory">
    <int name="termInfosIndexDivisor">8</int>
  </indexReaderFactory >

The other one is termIndexInterval which is set on the writer and determines 
what gets written to the tii file.  I don't remember how to set this in Solr.

Are we setting the right one to reduce RAM usage during merging?


> So I think the gist is... the RAM usage will be in proportion to the
> net size of the merge (mergeFactor + how big each merged segment is),
> how many merges you allow concurrently, and whether you do false or
> true deletions

Does an optimize do something differently?  

Tom

Reply via email to