Matthias Jaekle wrote:
Hi Andrzej,

thanks for your response. I am not really familar with the lucene internals.

I am just running nutch with the default parameters on a debian sarge system with ext3 file system, maximum 1024 files opened, and 1 GB RAM.

So is ext3 a bad file system for millions of files?

AFAIK reiserfs comes out a much better in benchmarks than ext3.noatime, especially for small files.


I could not change the file system in the moment. So I think I should change the parameters.

Which values would you suggest for
* indexer.mergeFactor?
* indexer.minMergeDocs?
* indexer.maxMergeDocs?
* indexer.termIndexInterval?

This is a bit involved topic... BTW. termIndexInterval doesn't belong here, what counts is the rest of parameters. Please see the comments in nutch-default.xml, they should give you a rough idea what are the tradeoffs.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to