Hi,

--- Matthias Jaekle <[EMAIL PROTECTED]> wrote:

> > You probably don't want to touch indexer.termIndexInterval and
> > indexer.maxMergeDocs (determines the max size of an individual
> > segment).
> Why is maxMergeDocs 50 by default? Should not this value be much
> higher?

50 is probably OK for most people, that's probably why it's the
default.
It can be higher, but you need to have RAM to support it.

> I found how to calculate the number of opened files
> But how could I calculate the memory which would be used?

You can't really do it, because different pages will be of different
sizes, and you won't know their sizes until you fetch them.  However,
there is a Nutch property for max file download size, so that can help
you calculate the upper bound.

> And is there any possibility to calculate how much files nutch will 
> create while indexing in the peak?

There are some formulas for calculating that.  I think I added them to
Lucene in Action, but can't really remember now.... ah, there is is,
section 2.7.1.  The formula in the example there is:
11 segments/index * (7 files/segment + 1 file for indexed field)

This is for mergeFactor=10, multi-file index format, and an index with
1 indexed field.

Otis

____________________________________________________________________
Simpy -- simpy.com -- tags, social bookmarks, personal search engine

Reply via email to