I suggest to force fetcher to quit its loop in a graceful way.
To me, this can be archived by either
(1) implementing proper signal handling for ctrl C, or similar
or
(2) implant a file (e.g., fetcher.done or fetcher.stop) under segment dir,
that fetcher checks after every 1 or 10 or 100 or 1000, etc. crawls.

Solution (1) is more elegant, but not sure how well signal works on windows
and it takes longer time to implement. I do not have time for this now.
Solution (2) is easy to do, but it needs frequent fs access, a slow down
(which may not be that bad?).

Any better way?

John

On Thu, Oct 28, 2004 at 03:49:38PM +0200, Andrzej Bialecki wrote:
> [EMAIL PROTECTED] wrote:
> 
> >>>The segment that you actually crawl will be lost.
> >>
> >>Not really - you get a partial segment, which may or may not be usable.
> >
> >
> >Interesting to know. However I never had this good luck, I got everytime a 
> >unexpected EOF Exception.
> 
> Yeah, that's the symptom of missing index.
> 
> >May this would one of the useful improvements to make nutch more error 
> >restent. 
> 
> Actually, it is possible to make it more resilient to crashes by setting 
> MapFile.Writer.setIndexInterval() to a smaller value (default 128, most 
> likely it should be read from the config), and then by making 
> BufferedRandomAccessFile.flushBuffer() method public, so that the 
> SequenceFile.Writer may call it after each index append - this way not 
> only the index will be always written quickly (as if it were 
> unbuffered), but also more frequently, resulting in smaller "chunks" of 
> possibly lost data.
> 
> The cost of this is a slightly increased memory use (the index file is 
> loaded fully in memory by MapFile.Reader), but other factors (increased 
> disk usage for index file, decreased write performance of the index file 
> because of buffer thrashing) are probably negligible. The advantage is 
> that you should be able to read more valid entries from corrupted files.
> 
> >
> >Thanks for the hint, we may should add this to the wiki as well.
> 
> Feel free to update it, if you wish.
> 
> -- 
> Best regards,
> Andrzej Bialecki
> 
> -------------------------------------------------
> Software Architect, System Integration Specialist
> CEN/ISSS EC Workshop, ECIMF project chair
> EU FP6 E-Commerce Expert/Evaluator
> -------------------------------------------------
> FreeBSD developer (http://www.freebsd.org)
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Sybase ASE Linux Express Edition - download now for FREE
> LinuxWorld Reader's Choice Award Winner for best database on Linux.
> http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
> _______________________________________________
> Nutch-general mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
> 
__________________________________________
http://www.neasys.com - A Good Place to Be
Come to visit us today!


-------------------------------------------------------
This Newsletter Sponsored by: Macrovision 
For reliable Linux application installations, use the industry's leading
setup authoring tool, InstallShield X. Learn more and evaluate 
today. http://clk.atdmt.com/MSI/go/ins0030000001msi/direct/01/
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to