You can't. Crawls are self contained. You can restart them by removing
all folders under the segments/xxxx/* directories except the
crawl_generate and then reexecuting the fetch job. But there isn't a
way to restart a crawl job from a mid checkpoint.
Dennis
Sherjeel Niazi wrote:
Hi,
I am using Nutch 0.9
I am crawling a series of URL's of a website but after some time the crawler
crash with the following error:
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:97)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:62)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:128)
How can I resume the crawler where it ends?
Sherjeel