I suggest to force fetcher to quit its loop in a graceful way. To me, this can be archived by either (1) implementing proper signal handling for ctrl C, or similar or (2) implant a file (e.g., fetcher.done or fetcher.stop) under segment dir, that fetcher checks after every 1 or 10 or 100 or 1000, etc. crawls.
Solution (1) is more elegant, but not sure how well signal works on windows and it takes longer time to implement. I do not have time for this now. Solution (2) is easy to do, but it needs frequent fs access, a slow down (which may not be that bad?). Any better way? John On Thu, Oct 28, 2004 at 03:49:38PM +0200, Andrzej Bialecki wrote: > [EMAIL PROTECTED] wrote: > > >>>The segment that you actually crawl will be lost. > >> > >>Not really - you get a partial segment, which may or may not be usable. > > > > > >Interesting to know. However I never had this good luck, I got everytime a > >unexpected EOF Exception. > > Yeah, that's the symptom of missing index. > > >May this would one of the useful improvements to make nutch more error > >restent. > > Actually, it is possible to make it more resilient to crashes by setting > MapFile.Writer.setIndexInterval() to a smaller value (default 128, most > likely it should be read from the config), and then by making > BufferedRandomAccessFile.flushBuffer() method public, so that the > SequenceFile.Writer may call it after each index append - this way not > only the index will be always written quickly (as if it were > unbuffered), but also more frequently, resulting in smaller "chunks" of > possibly lost data. > > The cost of this is a slightly increased memory use (the index file is > loaded fully in memory by MapFile.Reader), but other factors (increased > disk usage for index file, decreased write performance of the index file > because of buffer thrashing) are probably negligible. The advantage is > that you should be able to read more valid entries from corrupted files. > > > > >Thanks for the hint, we may should add this to the wiki as well. > > Feel free to update it, if you wish. > > -- > Best regards, > Andrzej Bialecki > > ------------------------------------------------- > Software Architect, System Integration Specialist > CEN/ISSS EC Workshop, ECIMF project chair > EU FP6 E-Commerce Expert/Evaluator > ------------------------------------------------- > FreeBSD developer (http://www.freebsd.org) > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > Nutch-general mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-general > __________________________________________ http://www.neasys.com - A Good Place to Be Come to visit us today! ------------------------------------------------------- This Newsletter Sponsored by: Macrovision For reliable Linux application installations, use the industry's leading setup authoring tool, InstallShield X. Learn more and evaluate today. http://clk.atdmt.com/MSI/go/ins0030000001msi/direct/01/ _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
