Matthias Jaekle wrote:
Hi Andrzej,
thanks for your response. I am not really familar with the lucene
internals.
I am just running nutch with the default parameters on a debian sarge
system with ext3 file system, maximum 1024 files opened, and 1 GB RAM.
So is ext3 a bad file system for millions of files?
AFAIK reiserfs comes out a much better in benchmarks than ext3.noatime,
especially for small files.
I could not change the file system in the moment. So I think I should
change the parameters.
Which values would you suggest for
* indexer.mergeFactor?
* indexer.minMergeDocs?
* indexer.maxMergeDocs?
* indexer.termIndexInterval?
This is a bit involved topic... BTW. termIndexInterval doesn't belong
here, what counts is the rest of parameters. Please see the comments in
nutch-default.xml, they should give you a rough idea what are the tradeoffs.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general