Greetings list,
I am trying to debug why my fetch process is dying on the reduce side -
I see a single reduce task out of 16 dying with the following message:
Timed out.java.io.IOException: Task process exit with nonzero status. at
org.apache.hadoop.mapred.TaskRunner.runChild
Which is caused by:
060412 083015 task_r_8dpshs 0.8685376% reduce > reduce
060412 083016 task_r_8dpshs 0.8685678% reduce > reduce
060412 084023 Task task_r_8dpshs timed out. Killing.
I have unsuccessfully attempted to determine the cause of this timeout.
It seems to only occur on larger fetches -- I performed a successful
fetch of 1M pages after commenting out '-.*(/.+?)/.*?\1/.*?\1/' from the
regex-urlfilter.txt file (per some suggestions on the list), prior to
that 1M was unstable. I then proceeded to launch a fetch of 10M pages,
or about 1/5th of my target amount, and ran into the same problem again.
JDK 1.4 versus 1.5 seems to make no difference.
When the reduce side of the fetch fails like this, it seems to render
the entire segment unusable. I cannot re-run the fetch on the failed
segment, nor can I updatedb using the failed segment. So in the end it
seems I am left with useless data, and ~6 hours wasted.
When I have been at the terminal to observe the timed out process before
it is reaped, I have seen that it continues to use 100% of a single
processor. strace of the java process did not produce any usable leads.
When the reduce task is reassigned, either to the same machine or
another, it will die around the same percentage completion.
Is there an option I can enable somewhere that will allow for more
verbose output to be written to the logs? Any other suggestions on
debugging this issue? It seems to me that it might be possible to take a
snapshot of the task while it is running (i.e. data and the task job
jar), so that I can debug it in isolation without re-running an entire
fetch process. I am unsure of how this might be done, though.
Regards,
-Shawn
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general