Hi Jason, Sorry for the delay. See my comments inline
On 2/23/07, jason marshall <[EMAIL PROTECTED]> wrote:
Raul, I'm not sure I can be as helpful as Yvan, having a more modest and polite test suite, but I have a bit of Unicode and specifically UTF-8 en/decoding experience, and I might be able to make a few observations. I'm curious about your comments about how some Unicode characters are not being handled properly. Which ones are you having trouble with? The new 32 bit characters, 0, something else?
Great, you help is really appreciated. I have just create a test that checks my encoding against implementation the String.getBytes("UTF-8") for the first 2**16 chars , and they are all equal but character 0xd8ff. You say in your comments that the problem is fixed in HEAD, but I'm
looking at HEAD http://svn.apache.org/viewvc/xml/security/trunk/src/org/apache/xml/security/c14n/implementations/CanonicalizerBase.java?view=markup And the code still seems to be using 8th bit checks throughout.
Can you point me where do you think is incorrect? or give me a test case? I will really appreciated it. I think you would be much better off removing the special casing you
added to speed up this class. Now maybe it's because I'm not encoding too many really big documents, or maybe it's because I'm fixated on MessageDigest issues, but I'm not seeing this as a critical performance problem to begin with. However even if it were, this is not a great way to achieve your goal.
Well I have profile it, with Juice(the openssl JNI wrapper for encrypting and digesting), and encoding is the slowest part. I'm thinking even creating an SSE assembler implementation. Anyway the slowdown in Java implementation is not the conversion but the array sizing (they just grow and shrink the output byte array several times, where my implementation do it once with ASCII-7 chars or 3 times with other characters). The 8bits where a stupid trick from my part. Sorry for it should be OK now. Anyway if you can contribute with any code please feel free, I will be more than glad. If you want to make this code go faster, your better bet is to split
up the methods in UTFHelpper so that Hotspot can inline the fast-path into the the callers. That'll get you the same effect with saner code. For example: final static void writeCharToUtf8(final char c,final OutputStream out) throws IOException{ if (c < 0x80) { out.write(c); } else { writeMultiByteCharToUtf8(c, out); } } final static protected void writeMultiByteCharToUtf8(final char c, final OutputStream out) throws IOException { if ((c >= 0xD800 && c <= 0xDBFF) || (c >= 0xDC00 && c <= 0xDFFF) ){ //No Surrogates in sun java ...
Great idea I will try & do it. I'm pretty sure that even the 1.3 Hotspot will be happy with this
code, but I haven't tested it (I'm having some trouble building the code from the source release, and work doesn't allow svn access through the firewall, for various reasons, a couple of which are understandable).
We should create some nightly build & publish mechanism. I will try to see how other projects handle this. Good luck, and keep us posted on your ETA for a 1.4.1 release. Thanks, I will try to see how much bug reports we got (the MS-Office bug looks promising) Regards, Raul Thanks,
Jason On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote: > > Hi Raul, > > Let me know when you have a pre-realease of version 1.4.1 or send it to me by email; I will then run all my junit tests cases and give you a feedback. We are using a lot of functionnality of the XML encryption and signature syntax and for this reason we have interesting test cases that can help you in the release process of XML security library. I don't have too much time to follow what happens with the project, but as I said in a previous email, I can try to run my test cases before you plan to release a new version to get a second feekback concerning the strongness of the library: 4 eyes is better than 2 eyes :-) > > Regards. Yvan > > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Raul Benito > Sent: mardi, 13. février 2007 12:18 > To: security-dev@xml.apache.org > Subject: Re: Signed document can be corrupted in certain circumstances > > Hi Hess, > > It is my fault, we have a "critic" bug > http://issues.apache.org/bugzilla/show_bug.cgi?id=41462 , the problem is that I was thinking in 8bits instead of 32bits. now it is quite fixed in head but we are having a problem with some part of unicode. I think I will do a 1.4.1 with this bug and several others. > And we have to reconsider my release strategy as it seems that nobody, not too many people test the release candidates :(. > > > On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote: > > > > > > Hi everybody, > > > > I think I found a critical bug into XML security V1.4.0 (Java). A XML > > document signed with Apache XML security can be corrupted in certain > > circumstances. > > > > Here are the start conditions and the results I have: > > > > 1. XML document encoding in "UTF-8" having a UNICODE character "\u263A" > > 2. The document is signed with Apache XML security ---> OK 3. The > > document is verified with Apache XML security ---> OK 4. The document > > is verified with IBM toolkit (XSS4J) ---> NOT OK > > > > Doing some investigation, I think I isolated the problem. It seems > > that the error is due to the Canonicalizer class. This class doesn't > > treat correctly > > UTF-8 characters coded on three bytes. Here is a test I did to confirm > > the > > problem: > > > > // XML character \u263A => ☺ => smiley > > String xmlString = "<document>Humour document (héhé > > \u263A)</document>"; > > byte[] xml = xmlString.getBytes("UTF-8"); > > String xmlHex = HexadecimalConvertor.toHex(xml); > > > > System.out.println(xmlString); > > System.out.println("Hexadecimal value: " + xmlHex); > > > > // Get the DOM document > > Document document = new > > XMLParser().parseXMLDocument(new > > ByteArrayInputStream(xml)); > > > > // Canonical > > byte[] canonicalXML = > > Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS ).canonicalizeSubtree(document); > > String canonicalXMLHex = HexadecimalConvertor.toHex (canonicalXML); > > String canonicalXMLString = new String(canonicalXML, "UTF-8"); > > > > System.out.println("Hexadecimal value: " + canonicalXMLHex); > > System.out.println(canonicalXMLString); > > > > and here is the result > > > > <document>Humour document (héhé ☺)</document> > > value: > > 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 > > e298ba 293c2f646f63756d656e743e > > value: > > 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 > > 3a 293c2f646f63756d656e743e > > <document>Humour document (héhé :)</document> > > > > The Canonicalizer class treats correctly the character "é" (E9) > > converted in > > UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":" > > (3a) but should be (e298ba); this is wrong. It seems that the > > Canonicalizer class doesn't manage correctly "UTF-8" characters coded on three bytes ! > > > > Anybody has an idea ? Can someboy help me because it occurs in the > > context of our application and now we have a lot of problems due to this situation. > > > > Thanks in advance. > > > > Regards. Yvan Hess > > > > > > > > > > Yvan Hess > > > > Chief Software Architect > > > > > > > > > > > > e-mail: [EMAIL PROTECTED] > > phone : +41 (0)26 460 66 66 > > fax : +41 (0)26 460 66 60 > > > > > > > > Informatique-MTF SA > > Route du Bleuet 1 > > CH-1762 Givisiez > > > > Excellence in Compliance and Document Management > > > > http://www.imtf.com > > > > > > > > DISCLAIMER > > This message is intended only for use by the person to whom it is addressed. > > It may contain information that is privileged and confidential. Its > > content does not constitute a formal commitment by IMTF. If you are > > not the intended recipient of this message, kindly notify the sender > > immediately and destroy this message. Thank You. > > > > > -- > http://r-bg.com > -- - Jason
-- http://r-bg.com