Is SIGNATURE_KEY (aka nutch.content.digest) a valid way to check if my page has changed since the last time I crawled it? I patched Nutch to properly handle modification dates, and then discovered that my web site doesn't send Modification-Date because it uses shmtl (Server-parsed HTML). I assume it's some sort of cryptographic hash of the entire page?
Another question: is Nutch smart enough to use that signature to determine that, say, http://xcski.com/ and http://xcski.com/index.html are the same page? -- http://www.linkedin.com/in/paultomblin
