Re: How to resume crawler after crash

Dennis Kubes Thu, 23 Apr 2009 21:09:27 -0700

You can't. Crawls are self contained. You can restart them by removingall folders under the segments/xxxx/* directories except thecrawl_generate and then reexecuting the fetch job. But there isn't away to restart a crawl job from a mid checkpoint.


Dennis


Sherjeel Niazi wrote:

Hi,

I am using Nutch 0.9
I am crawling a series of URL's of a website but after some time the crawler
crash with the following error:

Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:97)
    at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:62)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:128)

How can I resume the crawler where it ends?


Sherjeel

Re: How to resume crawler after crash

Reply via email to