It uses MD5 of the page content and another method whose exact name I cannot 
remember now, but that is more forgiving of small textual changes.  I think it 
also takes into consideration the Last-Modified HTTP response header, but I'd 
have to check that.
 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Miao Liqiang NCS <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, May 2, 2008 8:29:38 AM
> Subject: RE: Unable to tell if whether is any changes for the same webpage
> 
> Could you tell me how?
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Friday, May 02, 2008 2:12 PM
> To: [email protected]
> Subject: Re: Unable to tell if whether is any changes for the same
> webpage
> 
> Hi,
> 
> Yes, Nutch can detect when a page changed and when it didn't change.
>  
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> > From: Miao Liqiang NCS 
> > To: [email protected]
> > Sent: Friday, May 2, 2008 7:48:03 AM
> > Subject: Unable to tell if whether is any changes for the same webpage
> > 
> > Is Nutch able to tell whether there are any changes for the same
> > webpage? For example, a webpage has been updated since last crawling,
> is
> > nutch can tell this change of the webpage when there is a recrawling?
> > 
> > 
> 


Reply via email to