Re: fetch fails at reduce stage because can not sense heartbeat for 600 seconds

Mike Smith Wed, 18 Oct 2006 13:43:37 -0700

I am in the same state again, and same reduce jobs keep failing on different
machines. I cannot get the dump using kill -3 pid, it does not make the
thread to quit. Also, I tried to place some log into FetcherOutputFormat,
but because of this bug:
*https://issues.apache.org/jira/browse/HADOOP-406*<https://issues.apache.org/jira/browse/HADOOP-406>
The logging is not possible in the childs threads. Do you have any idea why
the reducers doesn't catch the QUIT signal from the cache. I am running the
latest version on SVN, otherwise I could log some key,value and url
filtering information at the reduce stage.


Mike


On 10/18/06, Dennis Kubes <[EMAIL PROTECTED]> wrote:


I agree with Andrzej that a thread dump would be best.  Also what
version of nutch are you using?

Dennis



Andrzej Bialecki wrote:
> Mike Smith wrote:
>> Hi Dennis,
>>
>> But it doesn't make sense since the reducers' keys are URLs and the
>> heartbeat cannot be sent when the reduce task is called. Since I am
>> truncating my http content to be less than 100K and I don't get any
>> file,
>> how come reducing a single record which is a single URL and writing its
>> parsed data into DFS takes more than 10 min!! Even if you load the
>> cluster
>> that should never happen. There should be another bug involved.
>>
>
>
> Could you try to produce a thread dump of a task in such state? (kill
> -SIGQUIT pid)
>

Re: fetch fails at reduce stage because can not sense heartbeat for 600 seconds

Reply via email to