Re: fetch fails at reduce stage because can not sense heartbeat for 600 seconds

Mike Smith Tue, 17 Oct 2006 21:00:13 -0700

Hi Dennis,

But it doesn't make sense since the reducers' keys are URLs and the
heartbeat cannot be sent when the reduce task is called. Since I am
truncating my http content to be less than 100K and I don't get any file,
how come reducing a single record which is a single URL and writing its
parsed data into DFS takes more than 10 min!! Even if you load the cluster
that should never happen. There should be another bug involved.


Thanks. Nima


On 10/17/06, Dennis Kubes <[EMAIL PROTECTED]> wrote:


I have seen this happen before if the box is loaded down with too many
tasks and the IO is maxed.  I have also seen this happen when the regex
filters spin out.  We changed our systems to use only prefix and suffix
url filters and that cleared up those types of problems for us.

Dennis

Mike Smith wrote:
> Hi,
>
>
>
> I've been running the latest trunk nutch version on a cluster of 10
> machines. Fetch mappers always finish without any problem over 4,000,000
> pages, but some reducers fails because of "Task failed to report
> status for
> 602 seconds. Killing." Once this task fails even if it gets assigned to
> another machine that fails again.
>
>
>
> I checked the reducer of the fetcher class, and it seems to be an
> identity
> reducer that gets stuck for one key and doesn't move any further. I am
> not
> storing any http contents or files, why reducer should take this long
> for a
> key which is a URL and its content is limited to 100,000 bytes.
>
>
>
> These faulty reducers do the copying and sorting (up to 66%) without any
> problem, and then they get stuck in the reduce stage.
>
>
>
> Thank. Mike
>

Re: fetch fails at reduce stage because can not sense heartbeat for 600 seconds

Reply via email to