Mehmet Tan wrote:
   Hi,
I want to ask a question about redirections. Correct me if I'm wrong
but if a page is redirected to a page that is already in the webdb, then the
next updatedb operation will overwrite all previous info about refetch,
because it is a newly created page in the fetcher whose fetchInterval is the initial fetch interval. How does the adaptive refetch algorithm handle this situation?

Yes, this is a bug, and it affects both the original and the patched versions - fetch interval shouldn't be blindly copied from any new CrawlDatum (this happens in CrawlDbReducer.java:86 in both versions), instead it should be initialized with the value from old.getFetchInterval(), if available. Please fix this in your version, I'll fix this in the un-patched version.

Thanks for spotting this!

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to