Oh great, I like that idea very much. I'll incorporate that into our own crawl scripts. Thanks!
Alex On Wed, May 1, 2013 at 3:19 AM, Julien Nioche <[email protected] > wrote: > The crawl script (/bin/crawl) can be stopped in its iterations if a .STOP > file is created in the same directory. Otherwise 'hadoop job -kill' is the > way to go > > J. > > > On 1 May 2013 00:20, AC Nutch <[email protected]> wrote: > > > Hello All, > > > > I've been looking around for a way to *safely* stop a crawl on Nutch 1.6. > > So far the only suggestion I see is to kill the hadoop job or just > ctrl+c. > > However, when I do this I oftentimes end up with corrupt segments that > > won't index to Solr, which is, of course not ideal. Is there any kind of > a > > proper solution to this (besides just updating to Nutch 2.x - not an > option > > here)? > > > > If not, are there any known workarounds? Would it suffice to catch the > > keyboard interrupt and delete the last segement - are there any issues > with > > this (besides losing that segment's data)? Can anyone think of a more > > elegant solution? > > > > Thanks! > > > > Alex > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

