I am working with Nutch 0.7.1.

As far as I understand the current  implementation (please correct me if I
am wrong), the MD5Hash is calculated based on the Pages' content. Pages with
the same content but identified by different URLs, share the same MD5Hash.

My requirement is to be able to uniquely identify all Pages in WebDB. Pages
with the same content, but identified by different URL's, should become a
unique MD5Hash. My question is if this is feasible at all and if yes, how
this can be accomplished.

Rgrds, Thomas Delnoij

Reply via email to