Subhojit Roy wrote:

Would it be possible to include in Nutch, the ability to crawl & download a
page only if the page has been updated since the last crawl? I had read
sometime back that there were plans to include such a feature. It would be a
very useful feature to have IMO. This of course depends on the "last
modified" timestamp being present on the webpage that is being crawled,
which I believe is not mandatory. Still those who do set it would benefit.

This is already implemented - see the Signature / MD5Signature / TextProfileSignature.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

Reply via email to