Well politeness can still be the problem... If for instance your are
crawling blogs like wordpress or blogspot, they are all different url
but with same IP so the fetcher will wait.



2010/1/20, axi <axi...@gmail.com>:
>
> Hi to all,
> I'm a novice user of Nutch, I have it on a debian machine, and I have probe
> the latest release of nutch 1.0 with very slow results in crawling, I have a
> 10 Megabytes/s connection and it only crawls at 300 Kb/s with peaks of 1
> Mb/s. I tweaked everything, dns, linux tcp settings, thread numbers, java
> conf etc.. but anything have effect. There are a lot of spin waiting threads
> there, only 10-20 of them working and I have injected 1M different hosts, so
> politeness is not the problem. I swithched back to 0.9 nutch, and then it
> works like a charm at good speeds 5-6 Mb/s with the bottleneck on machine
> cpu.
>
> ¿Are this issues solved on dev version of nutch or why is this happens?
>
> Thanks in advance,
>
> --
> View this message in context:
> http://old.nabble.com/Nutch-1.0-slow-crawls-tp27243302p27243302.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


-- 
-MilleBii-

Reply via email to