Hi Thomas I suppose the only unique key of contents in web db is page' url. So why not retrieve the content by url directly?
/Jack On 1/8/06, Thomas Delnoij <[EMAIL PROTECTED]> wrote: > I am working with Nutch 0.7.1. > > As far as I understand the current implementation (please correct me if I > am wrong), the MD5Hash is calculated based on the Pages' content. Pages with > the same content but identified by different URLs, share the same MD5Hash. > > My requirement is to be able to uniquely identify all Pages in WebDB. Pages > with the same content, but identified by different URL's, should become a > unique MD5Hash. My question is if this is feasible at all and if yes, how > this can be accomplished. > > Rgrds, Thomas Delnoij > > -- Keep Discovering ... ... http://www.jroller.com/page/jmars ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
