-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 18, 2006 8:02 PM
To: [email protected]
Subject: Re: question about crawldb
Importance: High

Anton Potehin wrote:
> 1.    We have found these flags in CrawlDatum class: 
> 
>   public static final byte STATUS_SIGNATURE = 0;
>   public static final byte STATUS_DB_UNFETCHED = 1;
>   public static final byte STATUS_DB_FETCHED = 2;
>   public static final byte STATUS_DB_GONE = 3;
>   public static final byte STATUS_LINKED = 4;
>   public static final byte STATUS_FETCH_SUCCESS = 5;
>   public static final byte STATUS_FETCH_RETRY = 6;
>   public static final byte STATUS_FETCH_GONE = 7;
> 
> Though the names of these flags describe their aims, it is not clear
> completely what they mean and what is the difference between
> STATUS_DB_FETCHED and STATUS_FETCH_SUCCESS for example.

The STATUS_DB_* codes are used in entries in the crawldb. 
STATUS_FETCH_* codes are used in fetcher output.  STATUS_LINKED is used 
in parser output for urls that are linked to.  A crawldb update combines 
all of these (the old version of the db, plus fetcher and parser output) 
to generate a new version of the db, containing only STATUS_DB_* 
entries.  This logic is in CrawlDbReducer.

Does that help?

Yes ;-) tnx...




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to