Xerces C DOM parser wrapped as Java DOM. What I mean conventional equals() should be preferred though == might have small performance gain.
Eric On Fri, Aug 13, 2010 at 10:50 AM, Chad La Joie <laj...@itumi.biz> wrote: > Which parser/DOM impl were you using? > > > On 8/13/10 1:33 PM, eric fu wrote: > >> I encountered a problem before(version 1.4) caused by apache java code >> which uses == for namespace comparison. In my own code, when adding DOM >> node to a document, I have to create namespace using string from Apache >> classes. That is, I cannot directly use >> "http://www.w3.org/2000/09/xmldsig#" as namespace String, instead I need >> to use APACHE.BlashClass.DSIG_URI. The bug is not only hard to find, but >> unnecessarily tie unrelated DOM code to XML security. >> >> Eric >> >> On Fri, Aug 13, 2010 at 6:02 AM, Colm O hEigeartaigh >> <cohei...@apache.org <mailto:cohei...@apache.org>> wrote: >> >> I would prefer if we stuck to the original plan of making sure "==" >> comparisons are only done for namespaces in a single piece of >> pluggable code. However, I think we should now revert to making the >> ".equals" comparison as the default for the next release, given that >> there is no compelling reason to do otherwise. Anyone who wants to >> experiment with getting a performance increase, can just plug the >> other piece of code in. >> >> Thoughts? >> >> Colm. >> >> >> On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie <laj...@itumi.biz >> <mailto:laj...@itumi.biz>> wrote: >> > I guess I didn't explicitly say this, but if, after a few days, >> people can't >> > suggest an issue with this testing methodology or provide testing >> inputs >> > that show different results, I'll rip out the helper class I >> added and just >> > use equals() everywhere. That'll make the code a lot nicer to read. >> > >> > On 8/9/10 10:19 AM, Chad La Joie wrote: >> >> >> >> So, I have some unexpected results from this work. >> >> >> >> I implemented a helper class that checked the equality of >> element local >> >> names, attribute local names, namespace URIs, and namespace >> prefixes >> >> (i.e. everything that Xerces always interns). Then I made sure to >> >> replace all == != and equals() that I could find with the >> appropriate >> >> call. >> >> >> >> To test, I picked the Canonicalizer20010315ExclusiveTest test >> case and >> >> made two alterations to the test22*excl methods: >> >> - do one c14n operation out the timing loop just to make sure >> all the >> >> classes are in memory, constants are loaded, etc. >> >> - in a 100 iteration loop, create a new canonicalizer, >> canonicalize a >> >> DOM tree, and time it using nanosecond time >> >> >> >> I did this for the example2_2_1.xml[1], example2_2_2.xml[2], >> example >> >> 2_2_3.xml[3] input files (test221excl, test221excl, test223excl >> >> respectively). >> >> >> >> Here are the results, measured in nanosecond timing. "total" >> indicates >> >> the total time spent in all 100 runs, i.e. the summation of each >> of the >> >> 100 results. >> >> >> >> test221excl: >> >> equals() == >> >> min 101000 99000 >> >> max 123000 191000 >> >> median 103000 105000 >> >> avg 103760 106540 >> >> total 10376000 10654000 >> >> >> >> test222excl: >> >> equals() == >> >> min 99000 101000 >> >> max 192000 128000 >> >> median 100000 108000 >> >> avg 102110 108480 >> >> total 10211000 10848000 >> >> >> >> test223excl (an XPath nodeset canonicalization) >> >> equals() == >> >> min 254000 248000 >> >> max 290000 353000 >> >> median 266000 265000 >> >> avg 266820 265800 >> >> total 26682000 26580000 >> >> >> >> So, what these numbers appear to suggest is that, in fact, >> equals() is >> >> more often faster than ==. This seems counter-intuitive unless >> the JVM >> >> has specialized optimization for the String.equals() method. >> >> >> >> Can anyone see where my testing is likely to be flawed? >> >> >> >> [1] >> >> >> >> >> >> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup >> < >> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup >> > >> >> >> >> [2] >> >> >> >> >> >> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup >> < >> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup >> > >> >> >> >> [3] >> >> >> >> >> >> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup >> < >> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup >> > >> >> >> >> >> >> On 8/2/10 10:11 AM, Chad La Joie wrote: >> >>> >> >>> So, while I don't have my access yet, Colm asked me if I'd take >> a look >> >>> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], >> 46681[3]) >> >>> >> >>> My executive summary is that clearly, as things stand, the >> current code >> >>> favors optimization over correctness. Rarely is this a good thing. >> >>> >> >>> Colm notes[4] that the reliance on intern'ed strings (and thus the >> >>> ability to use ==) occurs sporadically throughout the code and >> not just >> >>> within the ElementChecker implementations. He specifically >> mentioned >> >>> that the various C14N implementations, and indeed the == is >> used about 6 >> >>> times there for string comparison. >> >>> >> >>> My recommendation then is two fold: >> >>> - Ensure that nothing other than namespace bits are compared >> via ==. I >> >>> don't know that this occurs but the code should definitely be >> reviewed >> >>> to ensure that. >> >>> >> >>> - Create a new "NamespaceEqualityChecker" that provides methods >> for >> >>> checking the various bits of a namespace (URIs, prefixes) and >> use it >> >>> anywhere that either == or equals() is used today. >> Implementations based >> >>> on == and equals() would be provided with the default >> implementation >> >>> being equals()-based. A configuration option should then be made >> >>> available to control which impl gets used. Additionally, it >> might even >> >>> be possible to add some smarts that could detect known "good" >> parsers >> >>> that use interning and automatically use the == based >> implementation. >> >>> >> >>> I do not recommend changing any part of the code without >> addressing the >> >>> whole codebase (i.e. all the =='s need to be fixed or no change >> should >> >>> be made) because of the possibility of creating new, unwanted, >> effects. >> >>> The current functionality is undesirable but better the devil >> you know. >> >>> >> >>> I think that this should be addressed in the upcoming 1.4.4 >> release. If >> >>> quick consensus can be reached I'm willing to do the work with >> a window >> >>> of time I have available over the next 2-3 weeks. >> >>> >> >>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897 >> >>> [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637 >> >>> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681 >> >>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1 >> >> >> > >> > -- >> > Chad La Joie >> > http://itumi.biz >> > trusted identities, delivered >> > >> >> >> > -- > Chad La Joie > http://itumi.biz > trusted identities, delivered >