I had setup a crawl of our intranet, ( approximately 1.6 million pages ) and
had set the crawl parameters to be depth 5, MAX_INT pages per iteration
After 12 days on the 3rd iteration, I got a crash with an exception thrown
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob( JobClient.java:357 )
at org.apache.nutch.fetcher.Fetcher.fetch( Fetcher.java:443 )
at org.apache.nutch.crawl.Crawl.main( Crawl.java:111 )
I have two questions.
1. Does anyone know what the cause of this error, I looked in the hadoop
logs, and saw nothing that indicates the crash cause
2. Is there anyway I can restart this job? So that I don't lose 12 days of
fetching
Thanks,
-Charlie Williams
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general