Ken Krugler wrote:
060114 150937 Indexer: adding segment: /user/crawler/crawl-20060114111226/segments/20060114122751 060114 150937 Indexer: adding segment: /user/crawler/crawl-20060114111226/segments/20060114133620 Exception in thread "main" java.io.IOException: timed out waiting for responseat org.apache.nutch.ipc.Client.call(Client.java:296) at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127) at $Proxy1.submitJob(Unknown Source) at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259) at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288) at org.apache.nutch.indexer.Indexer.index(Indexer.java:259) at org.apache.nutch.crawl.Crawl.main(Crawl.java:121)1. Any ideas what might have caused it to time out just now, when it had successfully run many jobs up to that point?
I too have seen this, and found that increasing the ipc timeout fixes it. The underlying problem is that the JobTracker computes the input splits under the submitJob() RPC call. For sufficiently big jobs, this can cause an RPC timeout. The JobTracker should instead return from submitJob() immediately, and then compute the input splits in a separate thread.
2. What cruft might I need to get rid of because it died? For example, I see a reference to /home/crawler/tmp/local/jobTracker/job_18cunz.xml now when I try to execute some Nutch commands.
This should get cleaned up the next time the jobtracker is restarted. Doug
