[Nutch-general] Exception while intra-net crawling

Charlie Williams Thu, 15 Feb 2007 06:09:11 -0800

I had setup a crawl of our intranet, ( approximately 1.6 million pages ) and
had set the crawl parameters to be depth 5, MAX_INT pages per iteration


After 12 days on the 3rd iteration, I got a crash with an exception thrown

Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob( JobClient.java:357 )
at org.apache.nutch.fetcher.Fetcher.fetch( Fetcher.java:443 )
at org.apache.nutch.crawl.Crawl.main( Crawl.java:111 )

I have two questions.

1. Does anyone know what the cause of this error, I looked in the hadoop
logs, and saw nothing that indicates the crash cause
2. Is there anyway I can restart this job? So that I don't lose 12 days of
fetching

Thanks,

-Charlie Williams

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Exception while intra-net crawling

Reply via email to