Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

AC Nutch Tue, 30 Apr 2013 16:20:42 -0700

Hello All,

I've been looking around for a way to *safely* stop a crawl on Nutch 1.6.
So far the only suggestion I see is to kill the hadoop job or just ctrl+c.
However, when I do this I oftentimes end up with corrupt segments that
won't index to Solr, which is, of course not ideal. Is there any kind of a
proper solution to this (besides just updating to Nutch 2.x - not an option
here)?


If not, are there any known workarounds? Would it suffice to catch the
keyboard interrupt and delete the last segement - are there any issues with
this (besides losing that segment's data)? Can anyone think of a more
elegant solution?

Thanks!

Alex

Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

Reply via email to