Is this function provided in the nutch package, can I use it directly
without programming the API?

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 02, 2008 8:12 PM
To: [email protected]
Subject: Re: Unable to tell if whether is any changes for the same
webpage

It uses MD5 of the page content and another method whose exact name I
cannot remember now, but that is more forgiving of small textual
changes.  I think it also takes into consideration the Last-Modified
HTTP response header, but I'd have to check that.
 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Miao Liqiang NCS <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, May 2, 2008 8:29:38 AM
> Subject: RE: Unable to tell if whether is any changes for the same
webpage
> 
> Could you tell me how?
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Friday, May 02, 2008 2:12 PM
> To: [email protected]
> Subject: Re: Unable to tell if whether is any changes for the same
> webpage
> 
> Hi,
> 
> Yes, Nutch can detect when a page changed and when it didn't change.
>  
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> > From: Miao Liqiang NCS 
> > To: [email protected]
> > Sent: Friday, May 2, 2008 7:48:03 AM
> > Subject: Unable to tell if whether is any changes for the same
webpage
> > 
> > Is Nutch able to tell whether there are any changes for the same
> > webpage? For example, a webpage has been updated since last
crawling,
> is
> > nutch can tell this change of the webpage when there is a
recrawling?
> > 
> > 
> 

Reply via email to