Re: UTF-8 Hashing
On Apr 23, 3:39 pm, Jean-Marc Desperrier jmd...@alussinan.org wrote: Nelson B Bolyard wrote: Is that python code? I thought it was JavaScript. Yes, you're right, I had a really too quick look at it :-) On a second thought, I just had a look at this page: https://developer.mozilla.org/En/NsICryptoHash Where it states that one must use the scriptable unicode converter first to get bytes back that were UTF-8 encoded: var converter = Components.classes[@mozilla.org/intl/scriptableunicodeconverter]. createInstance(Components.interfaces.nsIScriptableUnicodeConverter); // we use UTF-8 here, you can choose other encodings. converter.charset = UTF-8; I guess that's what I missed? /rvdh -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto
Re: UTF-8 Hashing
Right, this seems to work very well. I consider it solved. Here is the corrected code: hash: function(str,method) { var converter = Components.classes[@mozilla.org/intl/ scriptableunicodeconverter].createInstance (Components.interfaces.nsIScriptableUnicodeConverter); converter.charset = UTF-8; var result = {}; var data = converter.convertToByteArray(str, result); var hash_engine = Components.classes[@mozilla.org/security/hash; 1].createInstance().QueryInterface (Components.interfaces.nsICryptoHash); switch(method) { case 'MD5': hash_engine.init(hash_engine.MD5); break; case 'SHA1': hash_engine.init(hash_engine.SHA1); break; case 'SHA256': hash_engine.init(hash_engine.SHA256); break; } hash_engine.update(data, result.value); return TOOLS.convert('bin2hex',hash_engine.finish(false)); }, -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto
Re: UTF-8 Hashing
Nelson B Bolyard wrote: Is that python code? I thought it was JavaScript. Yes, you're right, I had a really too quick look at it :-) -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto
Re: UTF-8 Hashing
If it helps, here is the code I currently utilize; hash: function(str,method) { var hash_engine = Components.classes[@mozilla.org/security/hash; 1].createInstance().QueryInterface (Components.interfaces.nsICryptoHash); switch(method) { case 'MD5': hash_engine.init(hash_engine.MD5); break; case 'SHA1': hash_engine.init(hash_engine.SHA1); break; case 'SHA256': hash_engine.init(hash_engine.SHA256); break; } var charcodes = []; for (var i = 0; i str.length; i++){ charcodes.push(str.charCodeAt(i)); } hash_engine.update(charcodes, str.length); return TOOLS.convert('bin2hex',hash_engine.finish(false)); }, -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto
Re: UTF-8 Hashing
starryrendezv...@gmail.com wrote, On 2009-04-22 07:40: If it helps, here is the code I currently utilize; [snip] I suspect (that is, guess) that your problem is at one of these two places: 1. Perhaps the following code does not pass the UTF8 string you expect it to pass to the hash algorithm. Maybe it's passing UTF16. Maybe it's including a trailing NUL character, or maybe it's not. Any of these differences may explain why the results you get from this javascript do not match results you get from other programs. var charcodes = []; for (var i = 0; i str.length; i++){ charcodes.push(str.charCodeAt(i)); } hash_engine.update(charcodes, str.length); 2. Perhaps the type returned by hash_engine.finish is the the type expected by TOOLS.convert('bin2hex' return TOOLS.convert('bin2hex',hash_engine.finish(false)); }, If you want further help with this, I suggest that you do these things: A. Create a small html page containing a minimal javascript script that is complete enough to reproduce it. It must include sample input. This should be no more than twice as big as the sample code you sent us before. B. Send us that page (paste it as text inline in a plain-text posting) along with a description of the expected output and the actual output. It would be good for you to also include the hexadecimal values for the UTF8 string that you believe is equivalent to your sample input. -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto
Re: UTF-8 Hashing
starryrendezv...@gmail.com wrote: hash: function(str,method) { [...] str.charCodeAt(i) python quite probably outputs the value of str.charCodeAt(i) as some variant of a UTF-16 value. Or UCS-2 with no handling of surrogates. Under which format is the string inside the file that md5sum hashes ? Rather than that, you probably should use the python equivalent of java's String.getBytes(charset) http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String), determining what is the proper value of charset for your use. -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto
Re: UTF-8 Hashing
Jean-Marc Desperrier wrote, On 2009-04-22 12:17 PDT: starryrendezv...@gmail.com wrote: hash: function(str,method) { [...] str.charCodeAt(i) python quite probably outputs the value of str.charCodeAt(i) as some variant of a UTF-16 value. Or UCS-2 with no handling of surrogates. Under which format is the string inside the file that md5sum hashes ? Rather than that, you probably should use the python equivalent of java's String.getBytes(charset) http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String), determining what is the proper value of charset for your use. Is that python code? I thought it was JavaScript. -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto
Re: UTF-8 Hashing
starryrendezv...@gmail.com wrote: If it helps, here is the code I currently utilize; hash: function(str,method) { var hash_engine = Components.classes[@mozilla.org/security/hash; 1].createInstance().QueryInterface (Components.interfaces.nsICryptoHash); var charcodes = []; for (var i = 0; i str.length; i++){ charcodes.push(str.charCodeAt(i)); } at this point, charcodes is an array of UCS2 (*) code points, [24744,22909] hash_engine.update(charcodes, str.length); https://developer.mozilla.org/en/nsICryptoHash#update that takes octets, so your values get truncated into [168,125] return TOOLS.convert('bin2hex',hash_engine.finish(false)); }, try, at the very top: str = unescape(encodeURIComponent(您好)) (this does a conversion to UTF8, because *URIComponent is defined to be UTF8, but unescape is platform encoding / ASCII / something along those lines) (*) IIRC; might be UTF16 instead, but that doesn't make a difference here -- dev-tech-crypto mailing list dev-tech-crypto@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-crypto