Hello,

just one question regarding updating the content of a
crawled index.

Usually you set the "db.default.fetch.interval" property
for adjusting the time when a page should be refetched.
Then you do a generate/fetch/updatedb and all pages
that are older then the specified interval are crawled again.

The bad point is that all the html-pages are downloaded
again. And that even though if nothing changed.

What is about the http-headers Last-Modified-Since and
If-Modified-Since?
Could Nutch support this? This could reduce traffic and makes
the crawling a litte smarter....

Thanks
Oliver

Reply via email to