Re: UTF-8 Hashing

2009-04-24 Thread starryrendezv...@gmail.com
On Apr 23, 3:39 pm, Jean-Marc Desperrier jmd...@alussinan.org wrote:
 Nelson B Bolyard wrote:
  Is that python code?  I thought it was JavaScript.

 Yes, you're right, I had a really too quick look at it :-)

On a second thought, I just had a look at this page:
https://developer.mozilla.org/En/NsICryptoHash
Where it states that one must use the scriptable unicode converter
first to get bytes back that were UTF-8 encoded:

var converter =
Components.classes[@mozilla.org/intl/scriptableunicodeconverter].
createInstance(Components.interfaces.nsIScriptableUnicodeConverter);

// we use UTF-8 here, you can choose other encodings.
converter.charset = UTF-8;

I guess that's what I missed?

/rvdh
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: UTF-8 Hashing

2009-04-24 Thread starryrendezv...@gmail.com
Right, this seems to work very well. I consider it solved.


Here is the corrected code:

hash: function(str,method) {

var converter = Components.classes[@mozilla.org/intl/
scriptableunicodeconverter].createInstance
(Components.interfaces.nsIScriptableUnicodeConverter);
converter.charset = UTF-8;
var result = {};
var data = converter.convertToByteArray(str, result);
var hash_engine = Components.classes[@mozilla.org/security/hash;
1].createInstance().QueryInterface
(Components.interfaces.nsICryptoHash);

switch(method) {
case 'MD5':
hash_engine.init(hash_engine.MD5);
break;
case 'SHA1':
hash_engine.init(hash_engine.SHA1);
break;
case 'SHA256':
hash_engine.init(hash_engine.SHA256);
break;
}

   hash_engine.update(data, result.value);
   return TOOLS.convert('bin2hex',hash_engine.finish(false));
},
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: UTF-8 Hashing

2009-04-23 Thread Jean-Marc Desperrier

Nelson B Bolyard wrote:

Is that python code?  I thought it was JavaScript.


Yes, you're right, I had a really too quick look at it :-)

--
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: UTF-8 Hashing

2009-04-22 Thread starryrendezv...@gmail.com
If it helps, here is the code I currently utilize;

hash: function(str,method) {

   var hash_engine = Components.classes[@mozilla.org/security/hash;
1].createInstance().QueryInterface
(Components.interfaces.nsICryptoHash);

switch(method) {
case 'MD5':
hash_engine.init(hash_engine.MD5);
break;
case 'SHA1':
hash_engine.init(hash_engine.SHA1);
break;
case 'SHA256':
hash_engine.init(hash_engine.SHA256);
break;
}

   var charcodes = [];
   for (var i = 0; i  str.length; i++){
 charcodes.push(str.charCodeAt(i));
   }
   hash_engine.update(charcodes, str.length);
   return TOOLS.convert('bin2hex',hash_engine.finish(false));
},
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: UTF-8 Hashing

2009-04-22 Thread Nelson B Bolyard
starryrendezv...@gmail.com wrote, On 2009-04-22 07:40:
 If it helps, here is the code I currently utilize;

[snip]

I suspect (that is, guess) that your problem is at one of these two places:

1. Perhaps the following code does not pass the UTF8 string you expect it
to pass to the hash algorithm.  Maybe it's passing UTF16.  Maybe it's
including a trailing NUL character, or maybe it's not.  Any of these
differences may explain why the results you get from this javascript do not
match results you get from other programs.

  var charcodes = [];
  for (var i = 0; i  str.length; i++){
charcodes.push(str.charCodeAt(i));
  }
  hash_engine.update(charcodes, str.length);


2. Perhaps the type returned by hash_engine.finish is the the type expected
by TOOLS.convert('bin2hex'

  return TOOLS.convert('bin2hex',hash_engine.finish(false));
   },

If you want further help with this, I suggest that you do these things:

A. Create a small html page containing a minimal javascript script that
is complete enough to reproduce it.  It must include sample input.  This
should be no more than twice as big as the sample code you sent us before.

B. Send us that page (paste it as text inline in a plain-text posting)
along with a description of the expected output and the actual output.
It would be good for you to also include the hexadecimal values for
the UTF8 string that you believe is equivalent to your sample input.

-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: UTF-8 Hashing

2009-04-22 Thread Jean-Marc Desperrier

starryrendezv...@gmail.com wrote:

hash: function(str,method) {
[...] str.charCodeAt(i)


python quite probably outputs the value of str.charCodeAt(i) as some 
variant of a UTF-16 value. Or UCS-2 with no handling of surrogates.

Under which format is the string inside the file that md5sum hashes ?

Rather than that, you probably should use the python equivalent of 
java's String.getBytes(charset) 
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String), 
determining what is the proper value of charset for your use.


--
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: UTF-8 Hashing

2009-04-22 Thread Nelson B Bolyard
Jean-Marc Desperrier wrote, On 2009-04-22 12:17 PDT:
 starryrendezv...@gmail.com wrote:
  hash: function(str,method) {
 [...] str.charCodeAt(i)
 
 python quite probably outputs the value of str.charCodeAt(i) as some 
 variant of a UTF-16 value. Or UCS-2 with no handling of surrogates.
 Under which format is the string inside the file that md5sum hashes ?
 
 Rather than that, you probably should use the python equivalent of 
 java's String.getBytes(charset) 
 http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String),
  
 determining what is the proper value of charset for your use.

Is that python code?  I thought it was JavaScript.
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: UTF-8 Hashing

2009-04-22 Thread Mook

starryrendezv...@gmail.com wrote:

If it helps, here is the code I currently utilize;

hash: function(str,method) {

   var hash_engine = Components.classes[@mozilla.org/security/hash;
1].createInstance().QueryInterface
(Components.interfaces.nsICryptoHash);





   var charcodes = [];
   for (var i = 0; i  str.length; i++){
 charcodes.push(str.charCodeAt(i));
   }

at this point, charcodes is an array of UCS2 (*) code points, [24744,22909]

   hash_engine.update(charcodes, str.length);

https://developer.mozilla.org/en/nsICryptoHash#update
that takes octets, so your values get truncated into [168,125]

   return TOOLS.convert('bin2hex',hash_engine.finish(false));
},


try, at the very top:
str = unescape(encodeURIComponent(您好))
(this does a conversion to UTF8, because *URIComponent is defined to be 
UTF8, but unescape is platform encoding / ASCII / something along those 
lines)



(*) IIRC; might be UTF16 instead, but that doesn't make a difference here
--
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto