[EMAIL PROTECTED] wrote:
Hi John,

I don't think you gave enough information for people to be able to help (e.g. 
include additional data at which point? Search?  Fetching?  Indexing?).


Yes, the digest should be unique (MD5).

Actually, it isn't - you forgot about content duplicates.

Currently URL of the page is the unique key in Nutch, so you can use URLs as primary keys in your external db.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to