D.Saravanaraj wrote:
Hi Andrzej,

Thanks for your Adaptice Reftech patch. I didn't get the working of adaptive
refetch well. I examined working of adaptive refetching, by reading the
crawldb. I created a folder in windows with 2 files and tried adaptive
refetching on that (URL is file:/D:/Test/).


What i infer is,

   1. For every refetch, the score of files (but not the directory) is
   increasing

This is curious, it should not be so. However, it's the same in the vanilla version of Nutch (without this patch), so we'll address this separately.


   2. Irrespective of the retry interval, the files will be fetched, when
   their modified date is changed

Or if the last fetch time (or their next fetch time) is greater than the system-wide db.max.fetch.interval. This is to prevent pages being lost when you phase out old segments.


   3. Even though the directory modified date is not changed, since it's
   contents changed (as the last modified date of one of the files is changed,
   which is indexed as the content of the directory), that directory is
   refetched

Well, yes, in case of protocol-file plugin this seems ok, don't you agree?


Please let me know if my inferences are correct and sorry for a bigger mail.

No problem with the size.Yes, your conclusions seem correct.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to