On 2010-11-03 02:40, Eric Martin wrote:
> Hi,
> 
>  
> 
> I am getting these logs and I have no idea what they mean. I have searched
> google and found very little documentation on it. That doesn't mean it
> doesn't exist just that I have a hard time finding it. I may have missed an
> obvious discussion of this and I am sorry if I did. Can someone point me to
> the documentation or an answer? I'm a law student. Thanks!

Given the composition of your fetch list (all remaining URLs in the
queue are from the same host) what you see is perfectly normal. There
are 50 fetching threads that can fetch items from any host. However, all
remaining items are from the same single host. Due to the politeness
limits Nutch won't make more than one connection to the host, and it
will space its requests N seconds apart - otherwise a multi-threaded
distributed crawler could easily overwhelm the target host.

So the logs indicate that only one thread is fetching at any given time,
there are at least 2500 items in the queue, and every N seconds the
thread is allowed to fetch an item. All other threads are spinning idle.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to