Hi, --- Matthias Jaekle <[EMAIL PROTECTED]> wrote:
> > You probably don't want to touch indexer.termIndexInterval and > > indexer.maxMergeDocs (determines the max size of an individual > > segment). > Why is maxMergeDocs 50 by default? Should not this value be much > higher? 50 is probably OK for most people, that's probably why it's the default. It can be higher, but you need to have RAM to support it. > I found how to calculate the number of opened files > But how could I calculate the memory which would be used? You can't really do it, because different pages will be of different sizes, and you won't know their sizes until you fetch them. However, there is a Nutch property for max file download size, so that can help you calculate the upper bound. > And is there any possibility to calculate how much files nutch will > create while indexing in the peak? There are some formulas for calculating that. I think I added them to Lucene in Action, but can't really remember now.... ah, there is is, section 2.7.1. The formula in the example there is: 11 segments/index * (7 files/segment + 1 file for indexed field) This is for mergeFactor=10, multi-file index format, and an index with 1 indexed field. Otis ____________________________________________________________________ Simpy -- simpy.com -- tags, social bookmarks, personal search engine