I had setup a crawl of our intranet, ( approximately 1.6 million pages ) and
had set the crawl parameters to be depth 5, MAX_INT pages per iteration

After 12 days on the 3rd iteration, I got a crash with an exception thrown

Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob( JobClient.java:357 )
at org.apache.nutch.fetcher.Fetcher.fetch( Fetcher.java:443 )
at org.apache.nutch.crawl.Crawl.main( Crawl.java:111 )

I have two questions.

1. Does anyone know what the cause of this error, I looked in the hadoop
logs, and saw nothing that indicates the crash cause
2. Is there anyway I can restart this job? So that I don't lose 12 days of
fetching

Thanks,

-Charlie Williams

Reply via email to