Matthias Jaekle wrote:
Hi Andrzej,
thanks for your response. I am not really familar with the lucene
internals.
I am just running nutch with the default parameters on a debian sarge
system with ext3 file system, maximum 1024 files opened, and 1 GB RAM.
So is ext3 a bad file system for millions of files?
AFAIK reiserfs comes out a much better in benchmarks than ext3.noatime,
especially for small files.
I could not change the file system in the moment. So I think I should
change the parameters.
Which values would you suggest for
* indexer.mergeFactor?
* indexer.minMergeDocs?
* indexer.maxMergeDocs?
* indexer.termIndexInterval?
This is a bit involved topic... BTW. termIndexInterval doesn't belong
here, what counts is the rest of parameters. Please see the comments in
nutch-default.xml, they should give you a rough idea what are the tradeoffs.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com