Re: Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

AC Nutch Wed, 01 May 2013 10:45:02 -0700

Oh great, I like that idea very much. I'll incorporate that into our own
crawl scripts. Thanks!


Alex


On Wed, May 1, 2013 at 3:19 AM, Julien Nioche <[email protected]
> wrote:

> The crawl script (/bin/crawl) can be stopped in its iterations if a .STOP
> file is created in the same directory. Otherwise 'hadoop job -kill' is the
> way to go
>
> J.
>
>
> On 1 May 2013 00:20, AC Nutch <[email protected]> wrote:
>
> > Hello All,
> >
> > I've been looking around for a way to *safely* stop a crawl on Nutch 1.6.
> > So far the only suggestion I see is to kill the hadoop job or just
> ctrl+c.
> > However, when I do this I oftentimes end up with corrupt segments that
> > won't index to Solr, which is, of course not ideal. Is there any kind of
> a
> > proper solution to this (besides just updating to Nutch 2.x - not an
> option
> > here)?
> >
> > If not, are there any known workarounds? Would it suffice to catch the
> > keyboard interrupt and delete the last segement - are there any issues
> with
> > this (besides losing that segment's data)? Can anyone think of a more
> > elegant solution?
> >
> > Thanks!
> >
> > Alex
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Re: Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

Reply via email to