Is this function provided in the nutch package, can I use it directly without programming the API?
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 02, 2008 8:12 PM To: [email protected] Subject: Re: Unable to tell if whether is any changes for the same webpage It uses MD5 of the page content and another method whose exact name I cannot remember now, but that is more forgiving of small textual changes. I think it also takes into consideration the Last-Modified HTTP response header, but I'd have to check that. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Miao Liqiang NCS <[EMAIL PROTECTED]> > To: [email protected] > Sent: Friday, May 2, 2008 8:29:38 AM > Subject: RE: Unable to tell if whether is any changes for the same webpage > > Could you tell me how? > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Friday, May 02, 2008 2:12 PM > To: [email protected] > Subject: Re: Unable to tell if whether is any changes for the same > webpage > > Hi, > > Yes, Nutch can detect when a page changed and when it didn't change. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > > From: Miao Liqiang NCS > > To: [email protected] > > Sent: Friday, May 2, 2008 7:48:03 AM > > Subject: Unable to tell if whether is any changes for the same webpage > > > > Is Nutch able to tell whether there are any changes for the same > > webpage? For example, a webpage has been updated since last crawling, > is > > nutch can tell this change of the webpage when there is a recrawling? > > > > >
