Looks like that fixed it.  Here is some of the output I saw.

task_0005_m_000005_0 Someone is setting way to long of a delay value...520 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...520 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...520 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...520 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...60 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...300 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...300 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...300 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...1200 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...60 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...60 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...120 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...100 seconds task_0005_m_000003_0 Someone is setting way to long of a delay value...3600 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...500 seconds

Dennis Kubes wrote:
I added some test code that hacks a 30 second delay when the delay is greater than 30 seconds. It prints out the original delay value. Here is the output I am seeing:

task_0005_m_000005_0 Someone is setting way to long of a delay value...520 seconds task_0005_m_000005_0 Someone is setting way to long of a delay value...520 seconds

So far it has hit 4 of 5 fetcher threads on a single machine. I am pretty sure this is what is causing the hung threads. I have a crawl running now. I will update on its status later. It is now 3am here so for now must sleep. :-P

Dennis

Andrzej Bialecki wrote:
Dennis Kubes wrote:
Just a thought going through the fetcher code. If the robots.txt specifies a delay >= the task timeout value, the task thread will sleep for that amount of time and eventually be considered a "hung thread" even though it is really just sleeping. Of course I could be reading the code wrong. It is about 2am here. I will test this concept tomorrow to see if that is actually what is happening with the hung threads.

For the fetcher to die all threads would have to end up in this state. But this sort of rings a bell - this may be an unintended consequence of implementing Crawl-Delay support ...

NUTCH-339 now compiles and is lightly tested. Threads don't block there, instead they put fetchlist entries on a time-sorted queue, and continue working on other items. So, this condition never occurs.

Reply via email to