You have two choices:
1. Use the aborted output. You'll need to touch the file fetcher.done in the segment directory. All the pages that were not crawled will be re-generated for fetch pretty soon. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way.
2. Discard the aborted output. To do this, just delete the fetcher* directories in the segment and restart the fetcher.
Doug
Stefan Groschupf wrote:
Hi,
hmm sorry, shame on me I do not know how to recover a fetch process.
I had abort the fetch process and wish to continue now and get this exception:
Exception in thread "main" java.io.IOException: already exists: segments/20040518024340/fetcher
at net.nutch.io.MapFile$Writer.<init>(MapFile.java:66)
at net.nutch.io.MapFile$Writer.<init>(MapFile.java:55)
at net.nutch.io.ArrayFile$Writer.<init>(ArrayFile.java:19)
at net.nutch.fetcher.RequestScheduler.main(RequestScheduler.java:1427)
Is there any chance to recove? I would write a small tool if necessary to recover the data, but where to start and what is to do?
Thanks for any hints, Stefan
--------------------------------------------------------------- open technology: http://www.media-style.com open source: http://www.weta-group.net open discussion: http://www.text-mining.org
------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
