So, I have some unexpected results from this work.
I implemented a helper class that checked the equality of element local
names, attribute local names, namespace URIs, and namespace prefixes
(i.e. everything that Xerces always interns). Then I made sure to
replace all == != and equals() that I could find with the appropriate call.
To test, I picked the Canonicalizer20010315ExclusiveTest test case and
made two alterations to the test22*excl methods:
- do one c14n operation out the timing loop just to make sure all the
classes are in memory, constants are loaded, etc.
- in a 100 iteration loop, create a new canonicalizer, canonicalize a
DOM tree, and time it using nanosecond time
I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example
2_2_3.xml[3] input files (test221excl, test221excl, test223excl
respectively).
Here are the results, measured in nanosecond timing. "total" indicates
the total time spent in all 100 runs, i.e. the summation of each of the
100 results.
test221excl:
equals() ==
min 101000 99000
max 123000 191000
median 103000 105000
avg 103760 106540
total 10376000 10654000
test222excl:
equals() ==
min 99000 101000
max 192000 128000
median 100000 108000
avg 102110 108480
total 10211000 10848000
test223excl (an XPath nodeset canonicalization)
equals() ==
min 254000 248000
max 290000 353000
median 266000 265000
avg 266820 265800
total 26682000 26580000
So, what these numbers appear to suggest is that, in fact, equals() is
more often faster than ==. This seems counter-intuitive unless the JVM
has specialized optimization for the String.equals() method.
Can anyone see where my testing is likely to be flawed?
[1]
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup
[2]
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup
[3]
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup
On 8/2/10 10:11 AM, Chad La Joie wrote:
So, while I don't have my access yet, Colm asked me if I'd take a look
at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3])
My executive summary is that clearly, as things stand, the current code
favors optimization over correctness. Rarely is this a good thing.
Colm notes[4] that the reliance on intern'ed strings (and thus the
ability to use ==) occurs sporadically throughout the code and not just
within the ElementChecker implementations. He specifically mentioned
that the various C14N implementations, and indeed the == is used about 6
times there for string comparison.
My recommendation then is two fold:
- Ensure that nothing other than namespace bits are compared via ==. I
don't know that this occurs but the code should definitely be reviewed
to ensure that.
- Create a new "NamespaceEqualityChecker" that provides methods for
checking the various bits of a namespace (URIs, prefixes) and use it
anywhere that either == or equals() is used today. Implementations based
on == and equals() would be provided with the default implementation
being equals()-based. A configuration option should then be made
available to control which impl gets used. Additionally, it might even
be possible to add some smarts that could detect known "good" parsers
that use interning and automatically use the == based implementation.
I do not recommend changing any part of the code without addressing the
whole codebase (i.e. all the =='s need to be fixed or no change should
be made) because of the possibility of creating new, unwanted, effects.
The current functionality is undesirable but better the devil you know.
I think that this should be addressed in the upcoming 1.4.4 release. If
quick consensus can be reached I'm willing to do the work with a window
of time I have available over the next 2-3 weeks.
[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
[2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
[3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
[4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1
--
Chad La Joie
http://itumi.biz
trusted identities, delivered