[Nutch-general] Large fetch fails with "Task process exit with nonzero status"

Shawn Gervais Fri, 07 Apr 2006 00:24:11 -0700

Hello,

I am trying to perform a large fetch (1 million pages), and observingsome reduce tasks dying with the following message:

Timed out.java.io.IOException: Task process exit with nonzero status. atorg.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273) atorg.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)


A little bit about my environment:

- I am running a test cluster of 16 machines, dual 3GHz Xeons with 2GBof RAM each, running JRE 1.5.0_06- Running Nutch 0.8-dev, built from trunk this afternoon. Hadoop 0.1.0taken from the nightly build.

All fetch tasks (32 of 32) complete successfully, as do most reduce jobs. However, one or two reduce jobs will fail with the above message. Uponfailure, they are rescheduled to another tracker as expected.

The rescheduled reduce task will run up until the same point as theprevious one died, and then sit around for ~10 minutes and die with thesame message. The jobtracker will reschedule the reduce task a few timesbefore giving up -- the entire job is aborted.

I was able to perform a successful fetch of 250,000 pages in my initialtests. I then tried to scale it up to 1M pages and I'm now stuck :/

Can anyone provide some clues as to where I might start on debuggingthis issue?


Regards,
-Shawn


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Large fetch fails with "Task process exit with nonzero status"

Reply via email to