D.Saravanaraj wrote:
Hi Andrzej,
Thanks for your Adaptice Reftech patch. I didn't get the working of adaptive
refetch well. I examined working of adaptive refetching, by reading the
crawldb. I created a folder in windows with 2 files and tried adaptive
refetching on that (URL is file:/D:/Test/).
What i infer is,
1. For every refetch, the score of files (but not the directory) is
increasing
This is curious, it should not be so. However, it's the same in the
vanilla version of Nutch (without this patch), so we'll address this
separately.
2. Irrespective of the retry interval, the files will be fetched, when
their modified date is changed
Or if the last fetch time (or their next fetch time) is greater than the
system-wide db.max.fetch.interval. This is to prevent pages being lost
when you phase out old segments.
3. Even though the directory modified date is not changed, since it's
contents changed (as the last modified date of one of the files is changed,
which is indexed as the content of the directory), that directory is
refetched
Well, yes, in case of protocol-file plugin this seems ok, don't you agree?
Please let me know if my inferences are correct and sorry for a bigger mail.
No problem with the size.Yes, your conclusions seem correct.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general