The crawl script (/bin/crawl) can be stopped in its iterations if a .STOP file is created in the same directory. Otherwise 'hadoop job -kill' is the way to go
J. On 1 May 2013 00:20, AC Nutch <[email protected]> wrote: > Hello All, > > I've been looking around for a way to *safely* stop a crawl on Nutch 1.6. > So far the only suggestion I see is to kill the hadoop job or just ctrl+c. > However, when I do this I oftentimes end up with corrupt segments that > won't index to Solr, which is, of course not ideal. Is there any kind of a > proper solution to this (besides just updating to Nutch 2.x - not an option > here)? > > If not, are there any known workarounds? Would it suffice to catch the > keyboard interrupt and delete the last segement - are there any issues with > this (besides losing that segment's data)? Can anyone think of a more > elegant solution? > > Thanks! > > Alex > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

