Mehmet Tan wrote:
Hi,
I want to ask a question about redirections. Correct me if I'm wrong
but if a page is redirected to a page that is already in the webdb,
then the
next updatedb operation will overwrite all previous info about refetch,
because it is a newly created page in the fetcher whose fetchInterval
is the initial
fetch interval. How does the adaptive refetch algorithm handle this
situation?
Yes, this is a bug, and it affects both the original and the patched
versions - fetch interval shouldn't be blindly copied from any new
CrawlDatum (this happens in CrawlDbReducer.java:86 in both versions),
instead it should be initialized with the value from
old.getFetchInterval(), if available. Please fix this in your version,
I'll fix this in the un-patched version.
Thanks for spotting this!
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general