On 2010-11-03 02:40, Eric Martin wrote: > Hi, > > > > I am getting these logs and I have no idea what they mean. I have searched > google and found very little documentation on it. That doesn't mean it > doesn't exist just that I have a hard time finding it. I may have missed an > obvious discussion of this and I am sorry if I did. Can someone point me to > the documentation or an answer? I'm a law student. Thanks!
Given the composition of your fetch list (all remaining URLs in the queue are from the same host) what you see is perfectly normal. There are 50 fetching threads that can fetch items from any host. However, all remaining items are from the same single host. Due to the politeness limits Nutch won't make more than one connection to the host, and it will space its requests N seconds apart - otherwise a multi-threaded distributed crawler could easily overwhelm the target host. So the logs indicate that only one thread is fetching at any given time, there are at least 2500 items in the queue, and every N seconds the thread is allowed to fetch an item. All other threads are spinning idle. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

