Raul,

I'm not sure I can be as helpful as Yvan, having a more modest and
polite test suite, but I have a bit of Unicode and specifically UTF-8
en/decoding experience, and I might be able to make a few
observations.  I'm curious about your comments about how some Unicode
characters are not being handled properly.  Which ones are you having
trouble with?  The new 32 bit characters, 0, something else?

You say in your comments that the problem is fixed in HEAD, but I'm
looking at HEAD

http://svn.apache.org/viewvc/xml/security/trunk/src/org/apache/xml/security/c14n/implementations/CanonicalizerBase.java?view=markup

And the code still seems to be using 8th bit checks throughout.

I think you would be much better off removing the special casing you
added to speed up this class.  Now maybe it's because I'm not encoding
too many really big documents, or maybe it's because I'm fixated on
MessageDigest issues, but I'm not seeing this as a critical
performance problem to begin with.  However even if it were, this is
not a great way to achieve your goal.

If you want to make this code go faster, your better bet is to split
up the methods in UTFHelpper so that Hotspot can inline the fast-path
into the the callers.  That'll get you the same effect with saner
code.  For example:

        final static void writeCharToUtf8(final char c,final OutputStream
out) throws IOException{
                if (c < 0x80) {
                out.write(c);
            }
           else
           {
               writeMultiByteCharToUtf8(c, out);
           }
      }

      final static protected void writeMultiByteCharToUtf8(final char
c, final OutputStream out)
          throws IOException
      {
                if ((c >= 0xD800 && c <= 0xDBFF) || (c >= 0xDC00 && c <= 
0xDFFF) ){
        //No Surrogates in sun java
           ...


I'm pretty sure that even the 1.3 Hotspot will be happy with this
code, but I haven't tested it (I'm having some trouble building the
code from the source release, and work doesn't allow svn access
through the firewall, for various reasons, a couple of which are
understandable).


Good luck, and keep us posted on your ETA for a 1.4.1 release.

Thanks,
Jason


On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote:

Hi Raul,

Let me know when you have a pre-realease of version 1.4.1 or send it to me by 
email; I will then run all my junit tests cases and give you a feedback. We are 
using a lot of functionnality of the XML encryption and signature syntax and 
for this reason we have interesting test cases that can help you in the release 
process of XML security library. I don't have too much time to follow what 
happens with the project, but as I said in a previous email, I can try to run 
my test cases before you plan to release a new version to get a second feekback 
concerning the strongness of the library: 4 eyes is better than 2 eyes :-)

Regards. Yvan


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Raul Benito
Sent: mardi, 13. février 2007 12:18
To: security-dev@xml.apache.org
Subject: Re: Signed document can be corrupted in certain circumstances

Hi Hess,

It is my fault, we have a "critic" bug
http://issues.apache.org/bugzilla/show_bug.cgi?id=41462 , the problem is that I 
was thinking in 8bits instead of 32bits. now it is quite fixed in head but we 
are having a problem with some part of unicode. I think I will do a 1.4.1 with 
this bug and several others.
And we have to reconsider my release strategy as it seems that nobody, not too 
many people test the release candidates :(.


On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote:
>
>
> Hi everybody,
>
> I think I found a critical bug into XML security V1.4.0 (Java). A XML
> document signed with Apache XML security can be corrupted in certain
> circumstances.
>
> Here are the start conditions and the results I have:
>
> 1. XML document encoding in "UTF-8" having a UNICODE character "\u263A"
> 2. The document is signed with Apache XML security --->  OK 3. The
> document is verified with Apache XML security --->  OK 4. The document
> is verified with IBM toolkit (XSS4J) ---> NOT OK
>
> Doing some investigation, I think I isolated the problem. It seems
> that the error is due to the Canonicalizer class. This class doesn't
> treat correctly
> UTF-8 characters coded on three bytes. Here is a test I did to confirm
> the
> problem:
>
>      // XML character \u263A => &#x0263A; => smiley
>       String xmlString = "<document>Humour document (héhé
> \u263A)</document>";
>       byte[] xml = xmlString.getBytes("UTF-8");
>       String xmlHex = HexadecimalConvertor.toHex(xml);
>
>       System.out.println(xmlString);
>       System.out.println("Hexadecimal value: " + xmlHex);
>
>       // Get the DOM document
>       Document document = new
> XMLParser().parseXMLDocument(new
> ByteArrayInputStream(xml));
>
>       // Canonical
>       byte[] canonicalXML =
> 
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document);
>       String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML);
>       String canonicalXMLString = new String(canonicalXML, "UTF-8");
>
>       System.out.println("Hexadecimal value: " + canonicalXMLHex);
>       System.out.println(canonicalXMLString);
>
> and here is the result
>
> <document>Humour document (héhé ☺)</document>
> value:
> 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
> e298ba 293c2f646f63756d656e743e
> value:
> 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
> 3a     293c2f646f63756d656e743e
> <document>Humour document (héhé :)</document>
>
> The Canonicalizer class treats correctly the character "é" (E9)
> converted in
> UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":"
> (3a) but should be (e298ba); this is wrong. It seems that the
> Canonicalizer class doesn't manage correctly "UTF-8" characters coded on 
three bytes !
>
> Anybody has an idea ? Can someboy help me because it occurs in the
> context of our application and now we have a lot of problems due to this 
situation.
>
> Thanks in advance.
>
> Regards. Yvan Hess
>
>
>
>
> Yvan Hess
>
> Chief Software Architect
>
>
>
>
>
> e-mail: [EMAIL PROTECTED]
> phone : +41 (0)26 460 66 66
> fax   : +41 (0)26 460 66 60
>
>
>
> Informatique-MTF SA
> Route du Bleuet 1
> CH-1762 Givisiez
>
> Excellence in Compliance and Document Management
>
> http://www.imtf.com
>
>
>
> DISCLAIMER
> This message is intended only for use by the person to whom it is addressed.
> It may contain information that is privileged and confidential. Its
> content does not constitute a formal commitment by IMTF. If you are
> not the intended recipient of this message, kindly notify the sender
> immediately and destroy this message. Thank You.
>


--
http://r-bg.com



--
- Jason

Reply via email to