Hi Thomas

I suppose the only unique key of contents in web db is page' url. So
why not retrieve the content by url directly?

/Jack


On 1/8/06, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> I am working with Nutch 0.7.1.
>
> As far as I understand the current  implementation (please correct me if I
> am wrong), the MD5Hash is calculated based on the Pages' content. Pages with
> the same content but identified by different URLs, share the same MD5Hash.
>
> My requirement is to be able to uniquely identify all Pages in WebDB. Pages
> with the same content, but identified by different URL's, should become a
> unique MD5Hash. My question is if this is feasible at all and if yes, how
> this can be accomplished.
>
> Rgrds, Thomas Delnoij
>
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Reply via email to