Re: Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

Julien Nioche Wed, 01 May 2013 00:19:53 -0700

The crawl script (/bin/crawl) can be stopped in its iterations if a .STOP
file is created in the same directory. Otherwise 'hadoop job -kill' is the
way to go


J.


On 1 May 2013 00:20, AC Nutch <[email protected]> wrote:

> Hello All,
>
> I've been looking around for a way to *safely* stop a crawl on Nutch 1.6.
> So far the only suggestion I see is to kill the hadoop job or just ctrl+c.
> However, when I do this I oftentimes end up with corrupt segments that
> won't index to Solr, which is, of course not ideal. Is there any kind of a
> proper solution to this (besides just updating to Nutch 2.x - not an option
> here)?
>
> If not, are there any known workarounds? Would it suffice to catch the
> keyboard interrupt and delete the last segement - are there any issues with
> this (besides losing that segment's data)? Can anyone think of a more
> elegant solution?
>
> Thanks!
>
> Alex
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

Reply via email to