This will work only for urls that has If-Modified-Since headers. But most urls does not have this header.
Thanks. Alex. -----Original Message----- From: Max Dzyuba <[email protected]> To: Markus Jelsma <[email protected]>; user <[email protected]> Sent: Fri, Aug 24, 2012 9:02 am Subject: RE: recrawl a URL? Thanks again! I'll have to test it more then in my 1.5.1. Best regards, MaxMarkus Jelsma <[email protected]> wrote:Hmm, i had to look it up but it is supported in 1.5 and 1.5.1: http://svn.apache.org/viewvc/nutch/tags/release-1.5.1/src/java/org/apache/nutch/indexer/IndexerMapReduce.java?view=markup -----Original message----- > From:Max Dzyuba <[email protected]> > Sent: Fri 24-Aug-2012 17:35 > To: Markus Jelsma <[email protected]>; [email protected] > Subject: RE: recrawl a URL? > > Thank you for the reply. Does it mean that it is not supported in latest stable release of Nutch? > > > -----Original Message----- > From: Markus Jelsma [mailto:[email protected]] > Sent: den 24 augusti 2012 17:21 > To: [email protected]; Max Dzyuba > Subject: RE: recrawl a URL? > > Hi, > > Trunk has a feature for this: indexer.skip.notmodified > > Cheers > > -----Original message----- > > From:Max Dzyuba <[email protected]> > > Sent: Fri 24-Aug-2012 17:19 > > To: [email protected] > > Subject: recrawl a URL? > > > > Hello everyone, > > > > > > > > I run a crawl command every day, but I don't want Nutch to submit an > > update to Solr if a particular page hasn't changed. How do I achieve > > that? Right now the value of db.fetch.interval.default doesn't seem to > > help prevent the crawl since the updates are submitted to Solr as if > > the page has been changed. I know for sure that the page has not been > > changed. This happens for every new crawl command. > > > > > > > > > > > > Thanks in advance, > > > > Max > > > > > >

