Hello, just one question regarding updating the content of a crawled index.
Usually you set the "db.default.fetch.interval" property for adjusting the time when a page should be refetched. Then you do a generate/fetch/updatedb and all pages that are older then the specified interval are crawled again. The bad point is that all the html-pages are downloaded again. And that even though if nothing changed. What is about the http-headers Last-Modified-Since and If-Modified-Since? Could Nutch support this? This could reduce traffic and makes the crawling a litte smarter.... Thanks Oliver
