Re: Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

Lewis John Mcgibbney Tue, 30 Apr 2013 19:43:29 -0700

Hi,
For reference, ideally you should fetch many smaller segments. This
prevents many baddies.
This sounds brutal, but I would just kill it.
You loose one segment... hopefully.
Lewis



On Tue, Apr 30, 2013 at 4:20 PM, AC Nutch <[email protected]> wrote:

> Hello All,
>
> I've been looking around for a way to *safely* stop a crawl on Nutch 1.6.
> So far the only suggestion I see is to kill the hadoop job or just ctrl+c.
> However, when I do this I oftentimes end up with corrupt segments that
> won't index to Solr, which is, of course not ideal. Is there any kind of a
> proper solution to this (besides just updating to Nutch 2.x - not an option
> here)?
>
> If not, are there any known workarounds? Would it suffice to catch the
> keyboard interrupt and delete the last segement - are there any issues with
> this (besides losing that segment's data)? Can anyone think of a more
> elegant solution?
>
> Thanks!
>
> Alex
>



-- 
*Lewis*

Re: Proper way to stop a crawl safely - Nutch 1.6 from Hadoop 1.1.1

Reply via email to