Ken Krugler wrote:

Hi Doug,

Did you ever resolve your 0.8 vs 0.7 crawling performance question? I'm
running into a similar problem.


We wound up dramatically increasing the number of threads, which seemed to help solve the bandwidth utilization problem. With Nutch 0.7 we were running about 200 threads per crawler, and with Nutch 0.8 it's more like 2000+ threads...though you have to reduce the thread stack size in this type of configuration.

Fetchlist seems to be sorted by url.This leads to many threads being blocked when crawler is configured to fetch by a low number of threads per host (default 1) and there are several urls from same host in the fetchlist.

This could perhaps be improved by sorting by some other key?

--
Sami Siren



Reply via email to