Re: Error at end of MapReduce run with indexing

Matt Zytaruk Thu, 19 Jan 2006 13:42:09 -0800

I am having this same problem during the reduce phase of fetching, andam now seeing:

060119 132458 Task task_r_obwceh timed out.  Killing.

Will the jobtracker restart this job? If so, if I change the ipc timeoutin the config, will the tasktracker read in the new value when the jobrestarts?This was a very large crawl and I would be loathe to have to re-fetch itall over again.


thanks for any info.

-Matt Zytaruk

Doug Cutting wrote:

Ken Krugler wrote:
060114 150937 Indexer: adding segment:/user/crawler/crawl-20060114111226/segments/20060114122751060114 150937 Indexer: adding segment:/user/crawler/crawl-20060114111226/segments/20060114133620Exception in thread "main" java.io.IOException: timed out waiting forresponse
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        at $Proxy1.submitJob(Unknown Source)
atorg.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
1. Any ideas what might have caused it to time out just now, when ithad successfully run many jobs up to that point?
I too have seen this, and found that increasing the ipc timeout fixesit. The underlying problem is that the JobTracker computes the inputsplits under the submitJob() RPC call. For sufficiently big jobs,this can cause an RPC timeout. The JobTracker should instead returnfrom submitJob() immediately, and then compute the input splits in aseparate thread.
2. What cruft might I need to get rid of because it died? Forexample, I see a reference to/home/crawler/tmp/local/jobTracker/job_18cunz.xml now when I try toexecute some Nutch commands.
This should get cleaned up the next time the jobtracker is restarted.

Doug

Re: Error at end of MapReduce run with indexing

Reply via email to