That is probably better, because it needs to always be unique (even if you
regenerate the index), and the URL would be.
Is there anyway to integrate data from an external source (DB) into the
results provided by nutch so that it would be returned as part of the result
set.  I am just starting to look into the plugins thinking that there might
be a way to do it there.

-John


On Thu, Jun 12, 2008 at 10:35 AM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote:
>
>> Hi John,
>>
>> I don't think you gave enough information for people to be able to help
>> (e.g. include additional data at which point? Search?  Fetching?
>>  Indexing?).
>>
>>
>> Yes, the digest should be unique (MD5).
>>
>
> Actually, it isn't - you forgot about content duplicates.
>
> Currently URL of the page is the unique key in Nutch, so you can use URLs
> as primary keys in your external db.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
John Martyniak
Before Dawn Solutions, Inc.
9457 S. University Blvd. #266
Highlands Ranch, CO 80126
o: 1-877-499-1562 x707 (Toll Free)
c: 303-522-1756
e: [EMAIL PROTECTED]
w: http://www.beforedawn.com

Reply via email to