Hello all, I've been trying to parse a segment of data (probably around
500k pages) I previously fetched, and everytime I try, I get an error.
Below is the error given by the slaves. The master gives a similar
error. This usually happens late in the reduce phase, but has also
happened during the map phase once. Any ideas what might be going on
here? Network issues? bugs in the tracker?
Thanks for any help you might be able to give.
-matt zytaruk
Slaves:
060102 200647 task_m_bvkze5 Child Error
java.io.IOException: Task process exit with nonzero status.
at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)
at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
060102 200833 task_m_bvkze5 done; removing files.
060102 200855 Client connection to 64.141.15.126:8050: closing
java.lang.reflect.UndeclaredThrowableException
at $Proxy0.pollForClosedTask(Unknown Source)
at
org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:241)
at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:633)
Caused by: java.io.IOException: timed out waiting for response
at org.apache.nutch.ipc.Client.call(Client.java:296)
at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
... 4 more
060102 201229 Lost connection to JobTracker
[crawler-d-01.internal.wavefire.ca/64.141.15.126:8050]. Retrying...
Master:
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at $Proxy0.getJobStatus(Unknown Source)
at org.apache.nutch.mapred.JobClient.getJob(JobClient.java:272)
at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:295)
at org.apache.nutch.crawl.ParseSegment.parse(ParseSegment.java:91)
at org.apache.nutch.crawl.ParseSegment.main(ParseSegment.java:110)
Caused by: java.io.IOException: timed out waiting for response
at org.apache.nutch.ipc.Client.call(Client.java:296)
at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general