Mehmet Tan wrote:

  Andrzej,
Thanks for your response and patch. But I have a few more questions about
adaptive refetch. As far as I understood the solution below is 'not to overwrite some fields of the entries' in the db. Assume we applied the adaptive refetch idea in your patch to the 0.7 version. We have the same redirection problem there too. What do you think is the best way to solve this problem there in version 0.7?

Well, you refer to two different problems:

* there was a problem in CrawlDbReducer that (possibly) new values of fetchInterval and fetchTime were not applied correctly to the CrawlDatum to be stored in the DB. The patch contained a fix ONLY for this issue.

* redirection problem: I'm not sure what should be the solution, IMHO it's a matter of properly setting URLFilters. If you don't allow certain patterns, you should not collect such urls, no matter if they come from redirection or directly from the outlinks. If you make an exception for such urls, next time you generate a fetchlist or updatedb these urls will be filtered out anyway.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to