I encountered a problem before(version 1.4) caused by apache java code which uses == for namespace comparison. In my own code, when adding DOM node to a document, I have to create namespace using string from Apache classes. That is, I cannot directly use "http://www.w3.org/2000/09/xmldsig#" as namespace String, instead I need to use APACHE.BlashClass.DSIG_URI. The bug is not only hard to find, but unnecessarily tie unrelated DOM code to XML security.
Eric On Fri, Aug 13, 2010 at 6:02 AM, Colm O hEigeartaigh <cohei...@apache.org>wrote: > I would prefer if we stuck to the original plan of making sure "==" > comparisons are only done for namespaces in a single piece of > pluggable code. However, I think we should now revert to making the > ".equals" comparison as the default for the next release, given that > there is no compelling reason to do otherwise. Anyone who wants to > experiment with getting a performance increase, can just plug the > other piece of code in. > > Thoughts? > > Colm. > > > On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie <laj...@itumi.biz> wrote: > > I guess I didn't explicitly say this, but if, after a few days, people > can't > > suggest an issue with this testing methodology or provide testing inputs > > that show different results, I'll rip out the helper class I added and > just > > use equals() everywhere. That'll make the code a lot nicer to read. > > > > On 8/9/10 10:19 AM, Chad La Joie wrote: > >> > >> So, I have some unexpected results from this work. > >> > >> I implemented a helper class that checked the equality of element local > >> names, attribute local names, namespace URIs, and namespace prefixes > >> (i.e. everything that Xerces always interns). Then I made sure to > >> replace all == != and equals() that I could find with the appropriate > >> call. > >> > >> To test, I picked the Canonicalizer20010315ExclusiveTest test case and > >> made two alterations to the test22*excl methods: > >> - do one c14n operation out the timing loop just to make sure all the > >> classes are in memory, constants are loaded, etc. > >> - in a 100 iteration loop, create a new canonicalizer, canonicalize a > >> DOM tree, and time it using nanosecond time > >> > >> I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example > >> 2_2_3.xml[3] input files (test221excl, test221excl, test223excl > >> respectively). > >> > >> Here are the results, measured in nanosecond timing. "total" indicates > >> the total time spent in all 100 runs, i.e. the summation of each of the > >> 100 results. > >> > >> test221excl: > >> equals() == > >> min 101000 99000 > >> max 123000 191000 > >> median 103000 105000 > >> avg 103760 106540 > >> total 10376000 10654000 > >> > >> test222excl: > >> equals() == > >> min 99000 101000 > >> max 192000 128000 > >> median 100000 108000 > >> avg 102110 108480 > >> total 10211000 10848000 > >> > >> test223excl (an XPath nodeset canonicalization) > >> equals() == > >> min 254000 248000 > >> max 290000 353000 > >> median 266000 265000 > >> avg 266820 265800 > >> total 26682000 26580000 > >> > >> So, what these numbers appear to suggest is that, in fact, equals() is > >> more often faster than ==. This seems counter-intuitive unless the JVM > >> has specialized optimization for the String.equals() method. > >> > >> Can anyone see where my testing is likely to be flawed? > >> > >> [1] > >> > >> > http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup > >> > >> [2] > >> > >> > http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup > >> > >> [3] > >> > >> > http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup > >> > >> > >> On 8/2/10 10:11 AM, Chad La Joie wrote: > >>> > >>> So, while I don't have my access yet, Colm asked me if I'd take a look > >>> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], > 46681[3]) > >>> > >>> My executive summary is that clearly, as things stand, the current code > >>> favors optimization over correctness. Rarely is this a good thing. > >>> > >>> Colm notes[4] that the reliance on intern'ed strings (and thus the > >>> ability to use ==) occurs sporadically throughout the code and not just > >>> within the ElementChecker implementations. He specifically mentioned > >>> that the various C14N implementations, and indeed the == is used about > 6 > >>> times there for string comparison. > >>> > >>> My recommendation then is two fold: > >>> - Ensure that nothing other than namespace bits are compared via ==. I > >>> don't know that this occurs but the code should definitely be reviewed > >>> to ensure that. > >>> > >>> - Create a new "NamespaceEqualityChecker" that provides methods for > >>> checking the various bits of a namespace (URIs, prefixes) and use it > >>> anywhere that either == or equals() is used today. Implementations > based > >>> on == and equals() would be provided with the default implementation > >>> being equals()-based. A configuration option should then be made > >>> available to control which impl gets used. Additionally, it might even > >>> be possible to add some smarts that could detect known "good" parsers > >>> that use interning and automatically use the == based implementation. > >>> > >>> I do not recommend changing any part of the code without addressing the > >>> whole codebase (i.e. all the =='s need to be fixed or no change should > >>> be made) because of the possibility of creating new, unwanted, effects. > >>> The current functionality is undesirable but better the devil you know. > >>> > >>> I think that this should be addressed in the upcoming 1.4.4 release. If > >>> quick consensus can be reached I'm willing to do the work with a window > >>> of time I have available over the next 2-3 weeks. > >>> > >>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897 > >>> [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637 > >>> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681 > >>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1 > >> > > > > -- > > Chad La Joie > > http://itumi.biz > > trusted identities, delivered > > >