Hello,

I am running Nutch 1.13 and was wondering if the digest field in the crawl
results can be configured.  Ideally, I would like the digest to be a hash
of the page content only.  A bit of Googling landed me at
https://wiki.apache.org/nutch/IndexStructure which describes the digest
field as follows:

"Adds a *message digest* field to a document. Can be MD5 over content and
headers or more sophisticated text profile of the content."

This makes it sound like the contents of the digest can be configured, but
I can't seem to figure out how.  Any help is greatly appreciated.  Thanks!

-- 
Dave Parker
Database & Systems Administrator
Utica College
Integrated Information Technology Services
(315) 792-3229
Registered Linux User #408177

Reply via email to