If this is stalling on only a few fetching tasks check the logs, more than likely it is fetching many pages from a single site (i.e. amazon, wikipedia, cnn) and the politeness settings (which you want to keep) are slowing it down.

If it is stalling on many task but a single machines check the hardware for the machine. We have seed hard disk speed decrease dramatically right before they are going to die. On linux do something like hdparm -tT /dev/hda where hda is the device to check. Average speeds for Sata should be in the 75MBps range for disk reads and 7000+ range for cached reads.

Another thing is you may be maxing your bandwidth and your provider is throttling you?

Dennis KUbes

purpureleaf wrote:
Hi, I have worked with nutch for sometime. One thing I am always curious is
when crawling, fetcher's speed will get slower and slower, no matter what
configuration I use.
My last test get this: ( just one site to make the problem more simple)

OS : winxp
java : 1.6.0.2
nutch: 0.9
cpu : AMD 1800
mem : 1G
network : 3m adsl

site : wikipedia.org
threads per site :30
server.delay : 0.5

It starts about 6page/s, but reduce to 4 in some minutes, then get slower
and slower. I have run it for 8 hours, just 2page/s left, and it was till
slowing down.
But if I stop it and start one other, it returns full speed (then slows down
again). I am ok with 2 pages/s for one site, but I do hope it will keep that
speed.

I found there are some guys in this list has the same problem. But I can't
find an answer.
If nutch designed to work this way?

Thanks!

Reply via email to