Hello all, Our indexes have around 3 billion unique terms, so for Solr 3, we set TermIndexInterval to about 8 times the default. The net effect of this is to reduce the size of the in-memory index by about 1/8th. (For background see for http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again, )
We would like to do something similar for Solr4. T he Lucene 4.10.2 JavaDoc for setTermIndexInterval suggests how this can be done by setting the minimum and maximum size for a block in Lucene code ( http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29 ) "For example, Lucene41PostingsFormat <http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html> implements the term index instead based upon how terms share prefixes. To configure its parameters (the minimum and maximum size for a block), you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, int) <http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29>. which can also be configured on a per-field basis" How can we configure Solr to use different (i.e. non-default) mimum and maximum block sizes? Tom