Hello, As far as I understood nutch recrawls urls when their fetch time has past current time regardless if those urls were modified or not. Is there any initiative on restricting recrawls to only those urls that have modified time(MT) greater than the old MT? In detail: if nutch have crawled a url with next fetch time in 30 days, then in the second recrawl nutch must visit this url, retrieve its modified time and compare it with modified time that we have in the crawldb and recrawl it if the new MT is greater than the old one, otherwise skip it.
Thanks. Alex.

