It's part of Nutch, happens automatically. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ---- > From: Miao Liqiang NCS <[EMAIL PROTECTED]> > To: nutch-user@lucene.apache.org > Sent: Sunday, May 4, 2008 8:33:49 PM > Subject: RE: Unable to tell if whether is any changes for the same webpage > > Is this function provided in the nutch package, can I use it directly > without programming the API? > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Friday, May 02, 2008 8:12 PM > To: nutch-user@lucene.apache.org > Subject: Re: Unable to tell if whether is any changes for the same > webpage > > It uses MD5 of the page content and another method whose exact name I > cannot remember now, but that is more forgiving of small textual > changes. I think it also takes into consideration the Last-Modified > HTTP response header, but I'd have to check that. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > > From: Miao Liqiang NCS > > To: nutch-user@lucene.apache.org > > Sent: Friday, May 2, 2008 8:29:38 AM > > Subject: RE: Unable to tell if whether is any changes for the same > webpage > > > > Could you tell me how? > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Sent: Friday, May 02, 2008 2:12 PM > > To: nutch-user@lucene.apache.org > > Subject: Re: Unable to tell if whether is any changes for the same > > webpage > > > > Hi, > > > > Yes, Nutch can detect when a page changed and when it didn't change. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > > From: Miao Liqiang NCS > > > To: nutch-user@lucene.apache.org > > > Sent: Friday, May 2, 2008 7:48:03 AM > > > Subject: Unable to tell if whether is any changes for the same > webpage > > > > > > Is Nutch able to tell whether there are any changes for the same > > > webpage? For example, a webpage has been updated since last > crawling, > > is > > > nutch can tell this change of the webpage when there is a > recrawling? > > > > > > > > >