Thanks Mike,
>>But, if you are doing deletions (or updateDocument, which is just a
>>delete + add under-the-hood), then this will force the terms index of
>>the segment readers to be loaded, thus consuming more RAM.
Out of 700,000 docs, by the time we get to doc 600,000, there is a good chance
a few documents have been updated, which would cause a delete +add.
>>One workaround for large terms index is to set the terms index divisor
>>.that IndexWriter should use whenever it loads a terms index (this is
>>IndexWriter.setReaderTermsIndexDivisor).
I always get confused about the two different divisors and their names in the
solrconfig.xml file
We are setting termInfosIndexDivisor, which I think translates to the Lucene
IndexWriter.setReaderTermsIndexDivisor
<indexReaderFactory name="IndexReaderFactory"
class="org.apache.solr.core.StandardIndexReaderFactory">
<int name="termInfosIndexDivisor">8</int>
</indexReaderFactory >
The other one is termIndexInterval which is set on the writer and determines
what gets written to the tii file. I don't remember how to set this in Solr.
Are we setting the right one to reduce RAM usage during merging?
> So I think the gist is... the RAM usage will be in proportion to the
> net size of the merge (mergeFactor + how big each merged segment is),
> how many merges you allow concurrently, and whether you do false or
> true deletions
Does an optimize do something differently?
Tom