Signed document can be corrupted in certain circumstances

Hess Yvan Tue, 13 Feb 2007 02:39:37 -0800

Hi everybody,
 
I think I found a critical bug into XML security V1.4.0 (Java). A XML document 
signed with Apache XML security can be corrupted in certain circumstances.  
 
Here are the start conditions and the results I have:
 
1. XML document encoding in "UTF-8" having a UNICODE character "\u263A"
2. The document is signed with Apache XML security --->  OK
3. The document is verified with Apache XML security --->  OK
4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK
 
Doing some investigation, I think I isolated the problem. It seems that the 
error is due to the Canonicalizer class. This class doesn't treat correctly 
UTF-8 characters coded on three bytes. Here is a test I did to confirm the 
problem:
 
     // XML character \u263A => &#x0263A; => smiley 
      String xmlString = "<document>Humour document (héhé \u263A)</document>";
      byte[] xml = xmlString.getBytes("UTF-8");
      String xmlHex = HexadecimalConvertor.toHex(xml);
      
      System.out.println(xmlString);
      System.out.println("Hexadecimal value: " + xmlHex);
 
      // Get the DOM document
      Document document = new XMLParser().parseXMLDocument(new 
ByteArrayInputStream(xml));
 
      // Canonical 
      byte[] canonicalXML = 
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document);
      String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML);
      String canonicalXMLString = new String(canonicalXML, "UTF-8");
 
      System.out.println("Hexadecimal value: " + canonicalXMLHex);
      System.out.println(canonicalXMLString);
 
and here is the result
 
<document>Humour document (héhé ☺)</document>
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 
e298ba 293c2f646f63756d656e743e
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 3a  
   293c2f646f63756d656e743e
<document>Humour document (héhé :)</document>
 
The Canonicalizer class treats correctly the character "é" (E9) converted in 
UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":" (3a) 
but should be (e298ba); this is wrong. It seems that the Canonicalizer class 
doesn't manage correctly "UTF-8" characters coded on three bytes !
 
Anybody has an idea ? Can someboy help me because it occurs in the context of 
our application and now we have a lot of problems due to this situation.
 
Thanks in advance.
 
Regards. Yvan Hess
 
 
Yvan Hess


Chief Software Architect

 

e-mail: [EMAIL PROTECTED]
phone : +41 (0)26 460 66 66 
fax   : +41 (0)26 460 66 60 

 

Informatique-MTF SA
Route du Bleuet 1 
CH-1762 Givisiez 

Excellence in Compliance and Document Management

http://www.imtf.com <http://www.imtf.com/> 

 

DISCLAIMER 
This message is intended only for use by the person to whom it is addressed. It 
may contain information that is privileged and confidential. Its content does 
not constitute a formal commitment by IMTF. If you are not the intended 
recipient of this message, kindly notify the sender immediately and destroy 
this message. Thank You.

Signed document can be corrupted in certain circumstances

Reply via email to