Hi Thomas I suppose the only unique key of contents in web db is page' url. So why not retrieve the content by url directly?
/Jack On 1/8/06, Thomas Delnoij <[EMAIL PROTECTED]> wrote: > I am working with Nutch 0.7.1. > > As far as I understand the current implementation (please correct me if I > am wrong), the MD5Hash is calculated based on the Pages' content. Pages with > the same content but identified by different URLs, share the same MD5Hash. > > My requirement is to be able to uniquely identify all Pages in WebDB. Pages > with the same content, but identified by different URL's, should become a > unique MD5Hash. My question is if this is feasible at all and if yes, how > this can be accomplished. > > Rgrds, Thomas Delnoij > > -- Keep Discovering ... ... http://www.jroller.com/page/jmars
