Ken Krugler wrote:
060114 150937 Indexer: adding segment: /user/crawler/crawl-20060114111226/segments/20060114122751 060114 150937 Indexer: adding segment: /user/crawler/crawl-20060114111226/segments/20060114133620 Exception in thread "main" java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        at $Proxy1.submitJob(Unknown Source)
        at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:121)

1. Any ideas what might have caused it to time out just now, when it had successfully run many jobs up to that point?

I too have seen this, and found that increasing the ipc timeout fixes it. The underlying problem is that the JobTracker computes the input splits under the submitJob() RPC call. For sufficiently big jobs, this can cause an RPC timeout. The JobTracker should instead return from submitJob() immediately, and then compute the input splits in a separate thread.

2. What cruft might I need to get rid of because it died? For example, I see a reference to /home/crawler/tmp/local/jobTracker/job_18cunz.xml now when I try to execute some Nutch commands.

This should get cleaned up the next time the jobtracker is restarted.

Doug


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to