Unfortunately, this function can't be implemented by just a simple API call. 
You have to  write some code yourself.  There is a field named "signature" in 
the  "CrawlDatum" class, and this value is stored permanently in the 
crawlDB,and you'll get a new signature if the new page content you get is 
different from last time. 
So very time ,when you are  updating crawldb from the new fetched segments, you 
can check whether the signature from the crawlDatum in segment is the same with 
the old one in the crawlDB.
You can begin with "CrawlDB.java " try to understand the updatdb process, and 
also take a look at Nutch-61.


----- Original Message ----- 
From: "Miao Liqiang NCS" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Monday, May 12, 2008 9:21 AM
Subject: RE: Unable to tell if whether is any changes for the same webpage


Someone please response, many thanks.

-----Original Message-----
From: Miao Liqiang NCS 
Sent: Monday, May 05, 2008 11:38 AM
To: [email protected]
Subject: RE: Unable to tell if whether is any changes for the same
webpage

In which way the nutch informs there are changes?  Am I able to know
whether there are changes or not? If nutch knows there are changes
internally, can I know that from outside through API or sonethging?

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, May 05, 2008 11:32 AM
To: [email protected]
Subject: Re: Unable to tell if whether is any changes for the same
webpage

It's part of Nutch, happens automatically.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Miao Liqiang NCS <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Sunday, May 4, 2008 8:33:49 PM
> Subject: RE: Unable to tell if whether is any changes for the same
webpage
> 
> Is this function provided in the nutch package, can I use it directly
> without programming the API?
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Friday, May 02, 2008 8:12 PM
> To: [email protected]
> Subject: Re: Unable to tell if whether is any changes for the same
> webpage
> 
> It uses MD5 of the page content and another method whose exact name I
> cannot remember now, but that is more forgiving of small textual
> changes.  I think it also takes into consideration the Last-Modified
> HTTP response header, but I'd have to check that.
>  
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> > From: Miao Liqiang NCS 
> > To: [email protected]
> > Sent: Friday, May 2, 2008 8:29:38 AM
> > Subject: RE: Unable to tell if whether is any changes for the same
> webpage
> > 
> > Could you tell me how?
> > 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> > Sent: Friday, May 02, 2008 2:12 PM
> > To: [email protected]
> > Subject: Re: Unable to tell if whether is any changes for the same
> > webpage
> > 
> > Hi,
> > 
> > Yes, Nutch can detect when a page changed and when it didn't change.
> >  
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > ----- Original Message ----
> > > From: Miao Liqiang NCS 
> > > To: [email protected]
> > > Sent: Friday, May 2, 2008 7:48:03 AM
> > > Subject: Unable to tell if whether is any changes for the same
> webpage
> > > 
> > > Is Nutch able to tell whether there are any changes for the same
> > > webpage? For example, a webpage has been updated since last
> crawling,
> > is
> > > nutch can tell this change of the webpage when there is a
> recrawling?
> > > 
> > > 
> > 
>

Reply via email to