It's part of Nutch, happens automatically.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Miao Liqiang NCS <[EMAIL PROTECTED]>
> To: nutch-user@lucene.apache.org
> Sent: Sunday, May 4, 2008 8:33:49 PM
> Subject: RE: Unable to tell if whether is any changes for the same webpage
> 
> Is this function provided in the nutch package, can I use it directly
> without programming the API?
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Friday, May 02, 2008 8:12 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: Unable to tell if whether is any changes for the same
> webpage
> 
> It uses MD5 of the page content and another method whose exact name I
> cannot remember now, but that is more forgiving of small textual
> changes.  I think it also takes into consideration the Last-Modified
> HTTP response header, but I'd have to check that.
>  
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> > From: Miao Liqiang NCS 
> > To: nutch-user@lucene.apache.org
> > Sent: Friday, May 2, 2008 8:29:38 AM
> > Subject: RE: Unable to tell if whether is any changes for the same
> webpage
> > 
> > Could you tell me how?
> > 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> > Sent: Friday, May 02, 2008 2:12 PM
> > To: nutch-user@lucene.apache.org
> > Subject: Re: Unable to tell if whether is any changes for the same
> > webpage
> > 
> > Hi,
> > 
> > Yes, Nutch can detect when a page changed and when it didn't change.
> >  
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > ----- Original Message ----
> > > From: Miao Liqiang NCS 
> > > To: nutch-user@lucene.apache.org
> > > Sent: Friday, May 2, 2008 7:48:03 AM
> > > Subject: Unable to tell if whether is any changes for the same
> webpage
> > > 
> > > Is Nutch able to tell whether there are any changes for the same
> > > webpage? For example, a webpage has been updated since last
> crawling,
> > is
> > > nutch can tell this change of the webpage when there is a
> recrawling?
> > > 
> > > 
> > 
> 


Reply via email to