Hello, I am running Nutch 1.13 and was wondering if the digest field in the crawl results can be configured. Ideally, I would like the digest to be a hash of the page content only. A bit of Googling landed me at https://wiki.apache.org/nutch/IndexStructure which describes the digest field as follows:
"Adds a *message digest* field to a document. Can be MD5 over content and headers or more sophisticated text profile of the content." This makes it sound like the contents of the digest can be configured, but I can't seem to figure out how. Any help is greatly appreciated. Thanks! -- Dave Parker Database & Systems Administrator Utica College Integrated Information Technology Services (315) 792-3229 Registered Linux User #408177

