Hi Hess, It is my fault, we have a "critic" bug http://issues.apache.org/bugzilla/show_bug.cgi?id=41462 , the problem is that I was thinking in 8bits instead of 32bits. now it is quite fixed in head but we are having a problem with some part of unicode. I think I will do a 1.4.1 with this bug and several others. And we have to reconsider my release strategy as it seems that nobody, not too many people test the release candidates :(.
On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote:
Hi everybody, I think I found a critical bug into XML security V1.4.0 (Java). A XML document signed with Apache XML security can be corrupted in certain circumstances. Here are the start conditions and the results I have: 1. XML document encoding in "UTF-8" having a UNICODE character "\u263A" 2. The document is signed with Apache XML security ---> OK 3. The document is verified with Apache XML security ---> OK 4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK Doing some investigation, I think I isolated the problem. It seems that the error is due to the Canonicalizer class. This class doesn't treat correctly UTF-8 characters coded on three bytes. Here is a test I did to confirm the problem: // XML character \u263A => ☺ => smiley String xmlString = "<document>Humour document (héhé \u263A)</document>"; byte[] xml = xmlString.getBytes("UTF-8"); String xmlHex = HexadecimalConvertor.toHex(xml); System.out.println(xmlString); System.out.println("Hexadecimal value: " + xmlHex); // Get the DOM document Document document = new XMLParser().parseXMLDocument(new ByteArrayInputStream(xml)); // Canonical byte[] canonicalXML = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document); String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML); String canonicalXMLString = new String(canonicalXML, "UTF-8"); System.out.println("Hexadecimal value: " + canonicalXMLHex); System.out.println(canonicalXMLString); and here is the result <document>Humour document (héhé ☺)</document> value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 e298ba 293c2f646f63756d656e743e value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 3a 293c2f646f63756d656e743e <document>Humour document (héhé :)</document> The Canonicalizer class treats correctly the character "é" (E9) converted in UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":" (3a) but should be (e298ba); this is wrong. It seems that the Canonicalizer class doesn't manage correctly "UTF-8" characters coded on three bytes ! Anybody has an idea ? Can someboy help me because it occurs in the context of our application and now we have a lot of problems due to this situation. Thanks in advance. Regards. Yvan Hess Yvan Hess Chief Software Architect e-mail: [EMAIL PROTECTED] phone : +41 (0)26 460 66 66 fax : +41 (0)26 460 66 60 Informatique-MTF SA Route du Bleuet 1 CH-1762 Givisiez Excellence in Compliance and Document Management http://www.imtf.com DISCLAIMER This message is intended only for use by the person to whom it is addressed. It may contain information that is privileged and confidential. Its content does not constitute a formal commitment by IMTF. If you are not the intended recipient of this message, kindly notify the sender immediately and destroy this message. Thank You.
-- http://r-bg.com