Hi Jason,

Sorry for the delay.
See my comments inline

On 2/23/07, jason marshall <[EMAIL PROTECTED]> wrote:

Raul,

I'm not sure I can be as helpful as Yvan, having a more modest and
polite test suite, but I have a bit of Unicode and specifically UTF-8
en/decoding experience, and I might be able to make a few
observations.  I'm curious about your comments about how some Unicode
characters are not being handled properly.  Which ones are you having
trouble with?  The new 32 bit characters, 0, something else?


Great, you help is really appreciated. I have just create a test that checks
my encoding against implementation the String.getBytes("UTF-8") for the
first 2**16 chars , and they are all equal but character 0xd8ff.

You say in your comments that the problem is fixed in HEAD, but I'm
looking at HEAD


http://svn.apache.org/viewvc/xml/security/trunk/src/org/apache/xml/security/c14n/implementations/CanonicalizerBase.java?view=markup

And the code still seems to be using 8th bit checks throughout.


Can you point me where do you think is incorrect? or give me a test case? I
will really appreciated it.

I think you would be much better off removing the special casing you
added to speed up this class.  Now maybe it's because I'm not encoding
too many really big documents, or maybe it's because I'm fixated on
MessageDigest issues, but I'm not seeing this as a critical
performance problem to begin with.  However even if it were, this is
not a great way to achieve your goal.


Well I have profile it, with Juice(the openssl JNI wrapper for encrypting
and digesting), and encoding is the slowest part. I'm thinking even creating
an SSE assembler implementation. Anyway the slowdown in Java implementation
is not the conversion but the array sizing (they just grow and shrink the
output byte array several times, where my implementation do it once with
ASCII-7 chars or 3 times with other characters).

The 8bits where a stupid trick from my part. Sorry for it should be OK now.
Anyway if you can contribute with any code please feel free, I will be more
than glad.


If you want to make this code go faster, your better bet is to split
up the methods in UTFHelpper so that Hotspot can inline the fast-path
into the the callers.  That'll get you the same effect with saner
code.  For example:

        final static void writeCharToUtf8(final char c,final OutputStream
out) throws IOException{
                if (c < 0x80) {
                out.write(c);
            }
            else
            {
                writeMultiByteCharToUtf8(c, out);
            }
       }

       final static protected void writeMultiByteCharToUtf8(final char
c, final OutputStream out)
           throws IOException
       {
                if ((c >= 0xD800 && c <= 0xDBFF) || (c >= 0xDC00 && c <=
0xDFFF) ){
                //No Surrogates in sun java
            ...


Great idea I will try & do it.

I'm pretty sure that even the 1.3 Hotspot will be happy with this
code, but I haven't tested it (I'm having some trouble building the
code from the source release, and work doesn't allow svn access
through the firewall, for various reasons, a couple of which are
understandable).


We should create some nightly build & publish mechanism. I will try to see
how other projects handle this.


Good luck, and keep us posted on your ETA for a 1.4.1 release.


Thanks,
I will try to see how much bug reports we got (the MS-Office bug looks
promising)

Regards,

Raul

Thanks,
Jason


On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote:
>
> Hi Raul,
>
> Let me know when you have a pre-realease of version 1.4.1 or send it to
me by email; I will then run all my junit tests cases and give you a
feedback. We are using a lot of functionnality of the XML encryption and
signature syntax and for this reason we have interesting test cases that can
help you in the release process of XML security library. I don't have too
much time to follow what happens with the project, but as I said in a
previous email, I can try to run my test cases before you plan to release a
new version to get a second feekback concerning the strongness of the
library: 4 eyes is better than 2 eyes :-)
>
> Regards. Yvan
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Raul Benito
> Sent: mardi, 13. février 2007 12:18
> To: security-dev@xml.apache.org
> Subject: Re: Signed document can be corrupted in certain circumstances
>
> Hi Hess,
>
> It is my fault, we have a "critic" bug
> http://issues.apache.org/bugzilla/show_bug.cgi?id=41462 , the problem is
that I was thinking in 8bits instead of 32bits. now it is quite fixed in
head but we are having a problem with some part of unicode. I think I will
do a 1.4.1 with this bug and several others.
> And we have to reconsider my release strategy as it seems that nobody,
not too many people test the release candidates :(.
>
>
> On 2/13/07, Hess Yvan <[EMAIL PROTECTED]> wrote:
> >
> >
> > Hi everybody,
> >
> > I think I found a critical bug into XML security V1.4.0 (Java). A XML
> > document signed with Apache XML security can be corrupted in certain
> > circumstances.
> >
> > Here are the start conditions and the results I have:
> >
> > 1. XML document encoding in "UTF-8" having a UNICODE character
"\u263A"
> > 2. The document is signed with Apache XML security --->  OK 3. The
> > document is verified with Apache XML security --->  OK 4. The document
> > is verified with IBM toolkit (XSS4J) ---> NOT OK
> >
> > Doing some investigation, I think I isolated the problem. It seems
> > that the error is due to the Canonicalizer class. This class doesn't
> > treat correctly
> > UTF-8 characters coded on three bytes. Here is a test I did to confirm
> > the
> > problem:
> >
> >      // XML character \u263A => &#x0263A; => smiley
> >       String xmlString = "<document>Humour document (héhé
> > \u263A)</document>";
> >       byte[] xml = xmlString.getBytes("UTF-8");
> >       String xmlHex = HexadecimalConvertor.toHex(xml);
> >
> >       System.out.println(xmlString);
> >       System.out.println("Hexadecimal value: " + xmlHex);
> >
> >       // Get the DOM document
> >       Document document = new
> > XMLParser().parseXMLDocument(new
> > ByteArrayInputStream(xml));
> >
> >       // Canonical
> >       byte[] canonicalXML =
> > Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS
).canonicalizeSubtree(document);
> >       String canonicalXMLHex = HexadecimalConvertor.toHex
(canonicalXML);
> >       String canonicalXMLString = new String(canonicalXML, "UTF-8");
> >
> >       System.out.println("Hexadecimal value: " + canonicalXMLHex);
> >       System.out.println(canonicalXMLString);
> >
> > and here is the result
> >
> > <document>Humour document (héhé ☺)</document>
> > value:
> > 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
> > e298ba 293c2f646f63756d656e743e
> > value:
> > 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
> > 3a     293c2f646f63756d656e743e
> > <document>Humour document (héhé :)</document>
> >
> > The Canonicalizer class treats correctly the character "é" (E9)
> > converted in
> > UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as
":"
> > (3a) but should be (e298ba); this is wrong. It seems that the
> > Canonicalizer class doesn't manage correctly "UTF-8" characters coded
on three bytes !
> >
> > Anybody has an idea ? Can someboy help me because it occurs in the
> > context of our application and now we have a lot of problems due to
this situation.
> >
> > Thanks in advance.
> >
> > Regards. Yvan Hess
> >
> >
> >
> >
> > Yvan Hess
> >
> > Chief Software Architect
> >
> >
> >
> >
> >
> > e-mail: [EMAIL PROTECTED]
> > phone : +41 (0)26 460 66 66
> > fax   : +41 (0)26 460 66 60
> >
> >
> >
> > Informatique-MTF SA
> > Route du Bleuet 1
> > CH-1762 Givisiez
> >
> > Excellence in Compliance and Document Management
> >
> > http://www.imtf.com
> >
> >
> >
> > DISCLAIMER
> > This message is intended only for use by the person to whom it is
addressed.
> > It may contain information that is privileged and confidential. Its
> > content does not constitute a formal commitment by IMTF. If you are
> > not the intended recipient of this message, kindly notify the sender
> > immediately and destroy this message. Thank You.
> >
>
>
> --
> http://r-bg.com
>


--
- Jason




--
http://r-bg.com

Reply via email to