I have seen this happen before if the box is loaded down with too many tasks and the IO is maxed. I have also seen this happen when the regex filters spin out. We changed our systems to use only prefix and suffix url filters and that cleared up those types of problems for us.

Dennis

Mike Smith wrote:
Hi,



I've been running the latest trunk nutch version on a cluster of 10
machines. Fetch mappers always finish without any problem over 4,000,000
pages, but some reducers fails because of "Task failed to report status for
602 seconds. Killing." Once this task fails even if it gets assigned to
another machine that fails again.



I checked the reducer of the fetcher class, and it seems to be an identity reducer that gets stuck for one key and doesn't move any further. I am not storing any http contents or files, why reducer should take this long for a
key which is a URL and its content is limited to 100,000 bytes.



These faulty reducers do the copying and sorting (up to 66%) without any
problem, and then they get stuck in the reduce stage.



Thank. Mike

Reply via email to