J�r�me Charron wrote:
Piotr,
looking at the code, and at the meta refresh tag, it seems there's a bug in the HTMLMetaProcessor.

Yes, it's a bug.

But, does this code make sense for a search engine (refreshing the same Url is usefull for a browser, but not really for a fetcher).

Now, that's a deeply philosophical question... ;-) But seriously, I can easily imagine a scenario when a refresh of 30 seconds, with the same url, brings each time a new useful content.

Whether the rest of the system uses this information, I'd say it's up to the other parts of the system an administrative policies. In my opinion, the values should be reported as such.

In the patches related to adapative fetch (NUTCH-61) I use this model when calculating next re-fetch time - there is a default FetchSchedule, which simply increments the time by 30 days (so the end result is as today), and there is an adaptive schedule which adjusts refetch interval and time based on whether the content has changed or not. However, in each case the lastModified time is noted and reported "as is" - so that other, more specialized parts, of the system can make more informed decisions.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to