I am also evaluating performance but on a single machine.  I am finding that
it crawls about two urls per second.  The fetch list is mainly unique so I
am looking for other performance bottlenecks.  The machine is an old PIII
with 512MB of RAM that is running with a load average of 3-4, so I am going
to try a faster machine next week.

What details about the network or the dns setup should I find out to
determine bottle necks in that area?

Vince

On 8/22/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> David Bargeron wrote:
> > Thanks. How can I determine how many unique hosts there are in my
> > fetchlists? And if it turns out there are not many unique hosts, can I
> force
> > Nutch to favor many unique hosts?
>
> You can dump the generated fetchlist (see bin/nutch readseg - you need
> to exclude missing segment parts) and then use regular Unix tools to
> prepare this list.
>
> You can also limit the number of urls per host - see the property
> generate.max.per.host in nutch-default.xml. Please note that this may
> drastically decrease the final number of generated urls in a segment, so
> that it's significantly lower than the target topN number.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Reply via email to