Hi everybody, I think I found a critical bug into XML security V1.4.0 (Java). A XML document signed with Apache XML security can be corrupted in certain circumstances. Here are the start conditions and the results I have: 1. XML document encoding in "UTF-8" having a UNICODE character "\u263A" 2. The document is signed with Apache XML security ---> OK 3. The document is verified with Apache XML security ---> OK 4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK Doing some investigation, I think I isolated the problem. It seems that the error is due to the Canonicalizer class. This class doesn't treat correctly UTF-8 characters coded on three bytes. Here is a test I did to confirm the problem: // XML character \u263A => ☺ => smiley String xmlString = "<document>Humour document (héhé \u263A)</document>"; byte[] xml = xmlString.getBytes("UTF-8"); String xmlHex = HexadecimalConvertor.toHex(xml); System.out.println(xmlString); System.out.println("Hexadecimal value: " + xmlHex); // Get the DOM document Document document = new XMLParser().parseXMLDocument(new ByteArrayInputStream(xml)); // Canonical byte[] canonicalXML = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document); String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML); String canonicalXMLString = new String(canonicalXML, "UTF-8"); System.out.println("Hexadecimal value: " + canonicalXMLHex); System.out.println(canonicalXMLString); and here is the result <document>Humour document (héhé ☺)</document> value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 e298ba 293c2f646f63756d656e743e value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 3a 293c2f646f63756d656e743e <document>Humour document (héhé :)</document> The Canonicalizer class treats correctly the character "é" (E9) converted in UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":" (3a) but should be (e298ba); this is wrong. It seems that the Canonicalizer class doesn't manage correctly "UTF-8" characters coded on three bytes ! Anybody has an idea ? Can someboy help me because it occurs in the context of our application and now we have a lot of problems due to this situation. Thanks in advance. Regards. Yvan Hess Yvan Hess
Chief Software Architect e-mail: [EMAIL PROTECTED] phone : +41 (0)26 460 66 66 fax : +41 (0)26 460 66 60 Informatique-MTF SA Route du Bleuet 1 CH-1762 Givisiez Excellence in Compliance and Document Management http://www.imtf.com <http://www.imtf.com/> DISCLAIMER This message is intended only for use by the person to whom it is addressed. It may contain information that is privileged and confidential. Its content does not constitute a formal commitment by IMTF. If you are not the intended recipient of this message, kindly notify the sender immediately and destroy this message. Thank You.