So, I have some unexpected results from this work.

I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call.

To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time

I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively).

Here are the results, measured in nanosecond timing. "total" indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results.

test221excl:
        equals()    ==
min     101000     99000
max     123000     191000
median  103000     105000
avg     103760     106540
total   10376000   10654000

test222excl:
        equals()    ==
min     99000      101000
max     192000     128000
median  100000     108000
avg     102110     108480
total   10211000   10848000

test223excl (an XPath nodeset canonicalization)
        equals()    ==
min     254000     248000
max     290000     353000
median  266000     265000
avg     266820     265800
total   26682000   26580000

So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method.

Can anyone see where my testing is likely to be flawed?

[1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup

On 8/2/10 10:11 AM, Chad La Joie wrote:
So, while I don't have my access yet, Colm asked me if I'd take a look
at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3])

My executive summary is that clearly, as things stand, the current code
favors optimization over correctness. Rarely is this a good thing.

Colm notes[4] that the reliance on intern'ed strings (and thus the
ability to use ==) occurs sporadically throughout the code and not just
within the ElementChecker implementations. He specifically mentioned
that the various C14N implementations, and indeed the == is used about 6
times there for string comparison.

My recommendation then is two fold:
- Ensure that nothing other than namespace bits are compared via ==. I
don't know that this occurs but the code should definitely be reviewed
to ensure that.

- Create a new "NamespaceEqualityChecker" that provides methods for
checking the various bits of a namespace (URIs, prefixes) and use it
anywhere that either == or equals() is used today. Implementations based
on == and equals() would be provided with the default implementation
being equals()-based. A configuration option should then be made
available to control which impl gets used. Additionally, it might even
be possible to add some smarts that could detect known "good" parsers
that use interning and automatically use the == based implementation.

I do not recommend changing any part of the code without addressing the
whole codebase (i.e. all the =='s need to be fixed or no change should
be made) because of the possibility of creating new, unwanted, effects.
The current functionality is undesirable but better the devil you know.

I think that this should be addressed in the upcoming 1.4.4 release. If
quick consensus can be reached I'm willing to do the work with a window
of time I have available over the next 2-3 weeks.

[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
[2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
[3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
[4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1

--
Chad La Joie
http://itumi.biz
trusted identities, delivered

Reply via email to