I encountered a problem before(version 1.4) caused by apache java code which
uses == for namespace comparison. In my own code, when adding DOM node to a
document, I have to create namespace using string from Apache classes. That
is, I cannot directly use "http://www.w3.org/2000/09/xmldsig#"; as namespace
String, instead I need to use APACHE.BlashClass.DSIG_URI. The bug is not
only hard to find, but unnecessarily tie unrelated DOM code to XML security.

Eric

On Fri, Aug 13, 2010 at 6:02 AM, Colm O hEigeartaigh <cohei...@apache.org>wrote:

> I would prefer if we stuck to the original plan of making sure "=="
> comparisons are only done for namespaces in a single piece of
> pluggable code. However, I think we should now revert to making the
> ".equals" comparison as the default for the next release, given that
> there is no compelling reason to do otherwise. Anyone who wants to
> experiment with getting a performance increase, can just plug the
> other piece of code in.
>
> Thoughts?
>
> Colm.
>
>
> On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie <laj...@itumi.biz> wrote:
> > I guess I didn't explicitly say this, but if, after a few days, people
> can't
> > suggest an issue with this testing methodology or provide testing inputs
> > that show different results, I'll rip out the helper class I added and
> just
> > use equals() everywhere.  That'll make the code a lot nicer to read.
> >
> > On 8/9/10 10:19 AM, Chad La Joie wrote:
> >>
> >> So, I have some unexpected results from this work.
> >>
> >> I implemented a helper class that checked the equality of element local
> >> names, attribute local names, namespace URIs, and namespace prefixes
> >> (i.e. everything that Xerces always interns). Then I made sure to
> >> replace all == != and equals() that I could find with the appropriate
> >> call.
> >>
> >> To test, I picked the Canonicalizer20010315ExclusiveTest test case and
> >> made two alterations to the test22*excl methods:
> >> - do one c14n operation out the timing loop just to make sure all the
> >> classes are in memory, constants are loaded, etc.
> >> - in a 100 iteration loop, create a new canonicalizer, canonicalize a
> >> DOM tree, and time it using nanosecond time
> >>
> >> I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example
> >> 2_2_3.xml[3] input files (test221excl, test221excl, test223excl
> >> respectively).
> >>
> >> Here are the results, measured in nanosecond timing. "total" indicates
> >> the total time spent in all 100 runs, i.e. the summation of each of the
> >> 100 results.
> >>
> >> test221excl:
> >> equals() ==
> >> min 101000 99000
> >> max 123000 191000
> >> median 103000 105000
> >> avg 103760 106540
> >> total 10376000 10654000
> >>
> >> test222excl:
> >> equals() ==
> >> min 99000 101000
> >> max 192000 128000
> >> median 100000 108000
> >> avg 102110 108480
> >> total 10211000 10848000
> >>
> >> test223excl (an XPath nodeset canonicalization)
> >> equals() ==
> >> min 254000 248000
> >> max 290000 353000
> >> median 266000 265000
> >> avg 266820 265800
> >> total 26682000 26580000
> >>
> >> So, what these numbers appear to suggest is that, in fact, equals() is
> >> more often faster than ==. This seems counter-intuitive unless the JVM
> >> has specialized optimization for the String.equals() method.
> >>
> >> Can anyone see where my testing is likely to be flawed?
> >>
> >> [1]
> >>
> >>
> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup
> >>
> >> [2]
> >>
> >>
> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup
> >>
> >> [3]
> >>
> >>
> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup
> >>
> >>
> >> On 8/2/10 10:11 AM, Chad La Joie wrote:
> >>>
> >>> So, while I don't have my access yet, Colm asked me if I'd take a look
> >>> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2],
> 46681[3])
> >>>
> >>> My executive summary is that clearly, as things stand, the current code
> >>> favors optimization over correctness. Rarely is this a good thing.
> >>>
> >>> Colm notes[4] that the reliance on intern'ed strings (and thus the
> >>> ability to use ==) occurs sporadically throughout the code and not just
> >>> within the ElementChecker implementations. He specifically mentioned
> >>> that the various C14N implementations, and indeed the == is used about
> 6
> >>> times there for string comparison.
> >>>
> >>> My recommendation then is two fold:
> >>> - Ensure that nothing other than namespace bits are compared via ==. I
> >>> don't know that this occurs but the code should definitely be reviewed
> >>> to ensure that.
> >>>
> >>> - Create a new "NamespaceEqualityChecker" that provides methods for
> >>> checking the various bits of a namespace (URIs, prefixes) and use it
> >>> anywhere that either == or equals() is used today. Implementations
> based
> >>> on == and equals() would be provided with the default implementation
> >>> being equals()-based. A configuration option should then be made
> >>> available to control which impl gets used. Additionally, it might even
> >>> be possible to add some smarts that could detect known "good" parsers
> >>> that use interning and automatically use the == based implementation.
> >>>
> >>> I do not recommend changing any part of the code without addressing the
> >>> whole codebase (i.e. all the =='s need to be fixed or no change should
> >>> be made) because of the possibility of creating new, unwanted, effects.
> >>> The current functionality is undesirable but better the devil you know.
> >>>
> >>> I think that this should be addressed in the upcoming 1.4.4 release. If
> >>> quick consensus can be reached I'm willing to do the work with a window
> >>> of time I have available over the next 2-3 weeks.
> >>>
> >>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
> >>> [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
> >>> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
> >>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1
> >>
> >
> > --
> > Chad La Joie
> > http://itumi.biz
> > trusted identities, delivered
> >
>

Reply via email to