Hi everybody,
 
I think I found a critical bug into XML security V1.4.0 (Java). A XML document 
signed with Apache XML security can be corrupted in certain circumstances.  
 
Here are the start conditions and the results I have:
 
1. XML document encoding in "UTF-8" having a UNICODE character "\u263A"
2. The document is signed with Apache XML security --->  OK
3. The document is verified with Apache XML security --->  OK
4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK
 
Doing some investigation, I think I isolated the problem. It seems that the 
error is due to the Canonicalizer class. This class doesn't treat correctly 
UTF-8 characters coded on three bytes. Here is a test I did to confirm the 
problem:
 
     // XML character \u263A => ☺ => smiley 
      String xmlString = "<document>Humour document (héhé \u263A)</document>";
      byte[] xml = xmlString.getBytes("UTF-8");
      String xmlHex = HexadecimalConvertor.toHex(xml);
      
      System.out.println(xmlString);
      System.out.println("Hexadecimal value: " + xmlHex);
 
      // Get the DOM document
      Document document = new XMLParser().parseXMLDocument(new 
ByteArrayInputStream(xml));
 
      // Canonical 
      byte[] canonicalXML = 
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document);
      String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML);
      String canonicalXMLString = new String(canonicalXML, "UTF-8");
 
      System.out.println("Hexadecimal value: " + canonicalXMLHex);
      System.out.println(canonicalXMLString);
 
and here is the result
 
<document>Humour document (héhé ☺)</document>
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 
e298ba 293c2f646f63756d656e743e
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 3a  
   293c2f646f63756d656e743e
<document>Humour document (héhé :)</document>
 
The Canonicalizer class treats correctly the character "é" (E9) converted in 
UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":" (3a) 
but should be (e298ba); this is wrong. It seems that the Canonicalizer class 
doesn't manage correctly "UTF-8" characters coded on three bytes !
 
Anybody has an idea ? Can someboy help me because it occurs in the context of 
our application and now we have a lot of problems due to this situation.
 
Thanks in advance.
 
Regards. Yvan Hess
 
 
Yvan Hess

Chief Software Architect

 

e-mail: [EMAIL PROTECTED]
phone : +41 (0)26 460 66 66 
fax   : +41 (0)26 460 66 60 

 

Informatique-MTF SA
Route du Bleuet 1 
CH-1762 Givisiez 

Excellence in Compliance and Document Management

http://www.imtf.com <http://www.imtf.com/> 

 

DISCLAIMER 
This message is intended only for use by the person to whom it is addressed. It 
may contain information that is privileged and confidential. Its content does 
not constitute a formal commitment by IMTF. If you are not the intended 
recipient of this message, kindly notify the sender immediately and destroy 
this message. Thank You.

 

Reply via email to