chad savage wrote:
Hello All,

With ftp and file crawls you can check the date of the file and match the date against yours. Http does not have that luxury. If this is on an internal site of yours being generated by a cms or even by hand, I'm sure you can create a list of pages that have been updated since last crawl. As for generic web page in the wild, No software (that I am aware of) can determine if a page has been updated without actually downloading it and matching it against its history.

That's not quite the case - please see the HTTP spec. for "Last-Modified" header. However, it's true that for dynamic pages clients often don't get this information, and then indeed we have to download the page and compare its signature to the previous signature.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to