Why do we have both HashUtil.checksum and HashUtil.rawChecksum? Both of these are being used as primary keys to a hash map, or cache. Therefore we would want a low risk of collisions. Using a message digest function reduces the risk, but the rawChecksum does "new String(md.digest(data))" which converts binary to a string based on the default character set encoding. Since conversion of invalid characters is "unspecified" the risk of collisions go up greatly when the conversion algorithm uses characters like "?" for the invalid character. A quick check with a single byte value shows that character sets "windows-1252" and "ISO-8859-1" seem to work with no collisions, single byte character sets like "US-ASCII" and multibyte character sets like "EUC-JP" and "UTF-8" have numerous invalid characters, hence increased collisions. Suggestions: 1) Use checksum 2) Do a base64 conversion 3) Wrap the byte[] in an object with a proper hashCode and equals. I'll be glad to create a patch for the chosen fix. Jon