I've been working with the following to consistently
get 200 rec/s indexed (index_more and language-ident
enabled)

Mind you i have over sized these and i'm working
backwards to shrink them down (all this machine does
is index). Odd thing is the jvm really didn't change
much with these adjusted.  Resident memory used went
up a bit, but cpu and overall memory usage didn't
change. This is on a 2gig ram server.

<property>
  <name>lang.ngram.max.length</name>
  <value>3</value>
  <description>
  </description>
</property>

<property>
  <name>lang.analyze.max.length</name>
  <value>512</value>
  <description>
  </description>
</property>

<property>
  <name>indexer.minMergeDocs</name>
  <value>500</value>
  <description>
  </description>
</property>

<property>
  <name>indexer.maxMergeDocs</name>
  <value>17179869176</value>
  <description>
  </description>
</property>

<property>
  <name>indexer.mergeFactor</name>
  <value>350</value>
  <description>
  </description>
</property>

Initially high index merge factor caused out of file
handle errors but increasing the others along with it
seemed to help get around that.

-byron


--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Byron Miller wrote:
> > For example i've been tweaking max merge/min merge
> and
> > such and i've been able to double my performance
> > without increasing anything but cpu load..
> 
> Smaller maxMergeDocs will cost you in the end, since
> these will 
> eventually be merged during the index optimization
> at the end.  I would 
> just leave this at Integer.MAX_VALUE.
> 
> Larger minMergeDocs will improve performance, but by
> using more heap. 
> So watch your heap size as you increase this and
> leave a healthy margin 
> for safety.  This is the best way to tweak indexing
> performance.
> 
> Larger mergeFactors may improve performance
> somewhat, but by using more 
> file handles.  In general, the maximum number of
> file handles is around 
> 10-20x (depending on plugins) the mergeFactor.  So
> raising this above 50 
> on most systems is risky, and the performance
> improvements are marginal, 
> so I wouldn't bother.
> 
> Doug
> 

Reply via email to