Hi Hess,

It is my fault, we have a "critic" bug
http://issues.apache.org/bugzilla/show_bug.cgi?id=41462 , the problem
is that I was thinking in 8bits instead of 32bits. now it is quite
fixed in head but we are having a problem with some part of unicode. I
think I will do a 1.4.1 with this bug and several others.
And we have to reconsider my release strategy as it seems that nobody,
not too many people test the release candidates :(.


On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote:


Hi everybody,

I think I found a critical bug into XML security V1.4.0 (Java). A XML
document signed with Apache XML security can be corrupted in certain
circumstances.

Here are the start conditions and the results I have:

1. XML document encoding in "UTF-8" having a UNICODE character "\u263A"
2. The document is signed with Apache XML security --->  OK
3. The document is verified with Apache XML security --->  OK
4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK

Doing some investigation, I think I isolated the problem. It seems that the
error is due to the Canonicalizer class. This class doesn't treat correctly
UTF-8 characters coded on three bytes. Here is a test I did to confirm the
problem:

     // XML character \u263A => &#x0263A; => smiley
      String xmlString = "<document>Humour document (héhé
\u263A)</document>";
      byte[] xml = xmlString.getBytes("UTF-8");
      String xmlHex = HexadecimalConvertor.toHex(xml);

      System.out.println(xmlString);
      System.out.println("Hexadecimal value: " + xmlHex);

      // Get the DOM document
      Document document = new
XMLParser().parseXMLDocument(new
ByteArrayInputStream(xml));

      // Canonical
      byte[] canonicalXML =
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document);
      String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML);
      String canonicalXMLString = new String(canonicalXML, "UTF-8");

      System.out.println("Hexadecimal value: " + canonicalXMLHex);
      System.out.println(canonicalXMLString);

and here is the result

<document>Humour document (héhé ☺)</document>
value:
3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
e298ba 293c2f646f63756d656e743e
value:
3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
3a     293c2f646f63756d656e743e
<document>Humour document (héhé :)</document>

The Canonicalizer class treats correctly the character "é" (E9) converted in
UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":"
(3a) but should be (e298ba); this is wrong. It seems that the Canonicalizer
class doesn't manage correctly "UTF-8" characters coded on three bytes !

Anybody has an idea ? Can someboy help me because it occurs in the context
of our application and now we have a lot of problems due to this situation.

Thanks in advance.

Regards. Yvan Hess




Yvan Hess

Chief Software Architect





e-mail: [EMAIL PROTECTED]
phone : +41 (0)26 460 66 66
fax   : +41 (0)26 460 66 60



Informatique-MTF SA
Route du Bleuet 1
CH-1762 Givisiez

Excellence in Compliance and Document Management

http://www.imtf.com



DISCLAIMER
This message is intended only for use by the person to whom it is addressed.
It may contain information that is privileged and confidential. Its content
does not constitute a formal commitment by IMTF. If you are not the intended
recipient of this message, kindly notify the sender immediately and destroy
this message. Thank You.



--
http://r-bg.com

Reply via email to