Re: [Nutch-dev] how to recover fetch

Jungshik Shin Tue, 06 Jul 2004 08:40:40 -0700

Doug Cutting wrote:

Stefan Groschupf wrote:
Hi, hmm sorry, shame on me I do not know how to recover a fetch process. I had abort the fetch process and wish to continue now and get this exception:


> You have two choices:
>
>   1. Use the aborted output.  You'll need to touch the file fetcher.done
> in the segment directory.  All the pages that were not crawled will be
> re-generated for fetch pretty soon.  If you fetched lots of pages, and
> don't want to have to re-fetch them again, this is the best way.

Sorry for resurrecting this old thread. I have to stop 'the fetch' process but I can't 'afford' to lose all the pages that were fetched because of the time constraint. It's a relief to find this message. However, I'm still wondering if I can go ahead with 'updating db' for this segment and later 'indexing' and 'duplicate deletion' over a set of segments including the aborted segment.

As long as the html and 'text' (distilled) contents and out-going links are preserved for all the fetched pages (as of the time of the aborting the fetch process), I'm all right even if other things are 'broken'.

Thank you in advance,

Jungshik

------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] how to recover fetch

Reply via email to