Hi everybody,
I think I found a critical bug into XML security V1.4.0 (Java). A XML document
signed with Apache XML security can be corrupted in certain circumstances.
Here are the start conditions and the results I have:
1. XML document encoding in "UTF-8" having a UNICODE character "\u263A"
2. The document is signed with Apache XML security ---> OK
3. The document is verified with Apache XML security ---> OK
4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK
Doing some investigation, I think I isolated the problem. It seems that the
error is due to the Canonicalizer class. This class doesn't treat correctly
UTF-8 characters coded on three bytes. Here is a test I did to confirm the
problem:
// XML character \u263A => ☺ => smiley
String xmlString = "<document>Humour document (héhé \u263A)</document>";
byte[] xml = xmlString.getBytes("UTF-8");
String xmlHex = HexadecimalConvertor.toHex(xml);
System.out.println(xmlString);
System.out.println("Hexadecimal value: " + xmlHex);
// Get the DOM document
Document document = new XMLParser().parseXMLDocument(new
ByteArrayInputStream(xml));
// Canonical
byte[] canonicalXML =
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document);
String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML);
String canonicalXMLString = new String(canonicalXML, "UTF-8");
System.out.println("Hexadecimal value: " + canonicalXMLHex);
System.out.println(canonicalXMLString);
and here is the result
<document>Humour document (héhé ☺)</document>
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
e298ba 293c2f646f63756d656e743e
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 3a
293c2f646f63756d656e743e
<document>Humour document (héhé :)</document>
The Canonicalizer class treats correctly the character "é" (E9) converted in
UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":" (3a)
but should be (e298ba); this is wrong. It seems that the Canonicalizer class
doesn't manage correctly "UTF-8" characters coded on three bytes !
Anybody has an idea ? Can someboy help me because it occurs in the context of
our application and now we have a lot of problems due to this situation.
Thanks in advance.
Regards. Yvan Hess
Yvan Hess
Chief Software Architect
e-mail: [EMAIL PROTECTED]
phone : +41 (0)26 460 66 66
fax : +41 (0)26 460 66 60
Informatique-MTF SA
Route du Bleuet 1
CH-1762 Givisiez
Excellence in Compliance and Document Management
http://www.imtf.com <http://www.imtf.com/>
DISCLAIMER
This message is intended only for use by the person to whom it is addressed. It
may contain information that is privileged and confidential. Its content does
not constitute a formal commitment by IMTF. If you are not the intended
recipient of this message, kindly notify the sender immediately and destroy
this message. Thank You.