Doug Cutting wrote:
Stefan Groschupf wrote:
Hi,
hmm sorry, shame on me I do not know how to recover a fetch process.
I had abort the fetch process and wish to continue now and get this exception:
> You have two choices: > > 1. Use the aborted output. You'll need to touch the file fetcher.done > in the segment directory. All the pages that were not crawled will be > re-generated for fetch pretty soon. If you fetched lots of pages, and > don't want to have to re-fetch them again, this is the best way.
Sorry for resurrecting this old thread. I have to stop 'the fetch' process but I can't 'afford' to lose
all the pages that were fetched because of the time constraint. It's a relief to find this message.
However, I'm still wondering if I can go ahead with 'updating db' for this segment and later
'indexing' and 'duplicate deletion' over a set of segments including the aborted segment.
As long as the html and 'text' (distilled) contents and out-going links are preserved for all the
fetched pages (as of the time of the aborting the fetch process), I'm all right
even if other things are 'broken'.
Thank you in advance,
Jungshik
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
