Looks like that fixed it. Here is some of the output I saw.
task_0005_m_000005_0 Someone is setting way to long of a delay
value...520 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...520 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...520 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...520 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...60 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...300 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...300 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...300 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...1200 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...60 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...60 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...120 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...100 seconds
task_0005_m_000003_0 Someone is setting way to long of a delay
value...3600 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...500 seconds
Dennis Kubes wrote:
I added some test code that hacks a 30 second delay when the delay is
greater than 30 seconds. It prints out the original delay value.
Here is the output I am seeing:
task_0005_m_000005_0 Someone is setting way to long of a delay
value...520 seconds
task_0005_m_000005_0 Someone is setting way to long of a delay
value...520 seconds
So far it has hit 4 of 5 fetcher threads on a single machine. I am
pretty sure this is what is causing the hung threads. I have a crawl
running now. I will update on its status later. It is now 3am here
so for now must sleep. :-P
Dennis
Andrzej Bialecki wrote:
Dennis Kubes wrote:
Just a thought going through the fetcher code. If the robots.txt
specifies a delay >= the task timeout value, the task thread will
sleep for that amount of time and eventually be considered a "hung
thread" even though it is really just sleeping. Of course I could
be reading the code wrong. It is about 2am here. I will test this
concept tomorrow to see if that is actually what is happening with
the hung threads.
For the fetcher to die all threads would have to end up in this
state. But this sort of rings a bell - this may be an unintended
consequence of implementing Crawl-Delay support ...
NUTCH-339 now compiles and is lightly tested. Threads don't block
there, instead they put fetchlist entries on a time-sorted queue, and
continue working on other items. So, this condition never occurs.