Hello,

I am trying to perform a large fetch (1 million pages), and observing some reduce tasks dying with the following message:

Timed out.java.io.IOException: Task process exit with nonzero status. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)

A little bit about my environment:

- I am running a test cluster of 16 machines, dual 3GHz Xeons with 2GB of RAM each, running JRE 1.5.0_06 - Running Nutch 0.8-dev, built from trunk this afternoon. Hadoop 0.1.0 taken from the nightly build.

All fetch tasks (32 of 32) complete successfully, as do most reduce jobs . However, one or two reduce jobs will fail with the above message. Upon failure, they are rescheduled to another tracker as expected.

The rescheduled reduce task will run up until the same point as the previous one died, and then sit around for ~10 minutes and die with the same message. The jobtracker will reschedule the reduce task a few times before giving up -- the entire job is aborted.

I was able to perform a successful fetch of 250,000 pages in my initial tests. I then tried to scale it up to 1M pages and I'm now stuck :/

Can anyone provide some clues as to where I might start on debugging this issue?

Regards,
-Shawn


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to