Raghavendra Prabhu wrote:
Hi Andrzej

After applying the patch, i seemed to find some strange behaviour

The fetch list for each URL was getting created inspite of the fact that
db.default.fetch.interval had not been reached

You probably forgot to change the interval from days to seconds. It's now expressed in seconds. This defines the maximum allowed interval, and any pages with interval higher than that will be refetched anyway - so if it's 30 (seconds :) ) then there is a high probability that you reach this limit before each cycle completes...

I thought this was supposed to be in this order

1)For the particular url/file get db fetch interval (which changes)

2) if current date exceeds db fetch interval, generate fetch list for the
particular file url

3) fetch list checks for file modified date and then decides to fetch the
latest contents file/URL

It is supposed to function in the above manner right. Did i miss out
anything???


Yes, this is how it's supposed to work.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to