I guess I didn't explicitly say this, but if, after a few days, people can't suggest an issue with this testing methodology or provide testing inputs that show different results, I'll rip out the helper class I added and just use equals() everywhere. That'll make the code a lot nicer to read.

On 8/9/10 10:19 AM, Chad La Joie wrote:
So, I have some unexpected results from this work.

I implemented a helper class that checked the equality of element local
names, attribute local names, namespace URIs, and namespace prefixes
(i.e. everything that Xerces always interns). Then I made sure to
replace all == != and equals() that I could find with the appropriate call.

To test, I picked the Canonicalizer20010315ExclusiveTest test case and
made two alterations to the test22*excl methods:
- do one c14n operation out the timing loop just to make sure all the
classes are in memory, constants are loaded, etc.
- in a 100 iteration loop, create a new canonicalizer, canonicalize a
DOM tree, and time it using nanosecond time

I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example
2_2_3.xml[3] input files (test221excl, test221excl, test223excl
respectively).

Here are the results, measured in nanosecond timing. "total" indicates
the total time spent in all 100 runs, i.e. the summation of each of the
100 results.

test221excl:
equals() ==
min 101000 99000
max 123000 191000
median 103000 105000
avg 103760 106540
total 10376000 10654000

test222excl:
equals() ==
min 99000 101000
max 192000 128000
median 100000 108000
avg 102110 108480
total 10211000 10848000

test223excl (an XPath nodeset canonicalization)
equals() ==
min 254000 248000
max 290000 353000
median 266000 265000
avg 266820 265800
total 26682000 26580000

So, what these numbers appear to suggest is that, in fact, equals() is
more often faster than ==. This seems counter-intuitive unless the JVM
has specialized optimization for the String.equals() method.

Can anyone see where my testing is likely to be flawed?

[1]
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup

[2]
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup

[3]
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup


On 8/2/10 10:11 AM, Chad La Joie wrote:
So, while I don't have my access yet, Colm asked me if I'd take a look
at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3])

My executive summary is that clearly, as things stand, the current code
favors optimization over correctness. Rarely is this a good thing.

Colm notes[4] that the reliance on intern'ed strings (and thus the
ability to use ==) occurs sporadically throughout the code and not just
within the ElementChecker implementations. He specifically mentioned
that the various C14N implementations, and indeed the == is used about 6
times there for string comparison.

My recommendation then is two fold:
- Ensure that nothing other than namespace bits are compared via ==. I
don't know that this occurs but the code should definitely be reviewed
to ensure that.

- Create a new "NamespaceEqualityChecker" that provides methods for
checking the various bits of a namespace (URIs, prefixes) and use it
anywhere that either == or equals() is used today. Implementations based
on == and equals() would be provided with the default implementation
being equals()-based. A configuration option should then be made
available to control which impl gets used. Additionally, it might even
be possible to add some smarts that could detect known "good" parsers
that use interning and automatically use the == based implementation.

I do not recommend changing any part of the code without addressing the
whole codebase (i.e. all the =='s need to be fixed or no change should
be made) because of the possibility of creating new, unwanted, effects.
The current functionality is undesirable but better the devil you know.

I think that this should be addressed in the upcoming 1.4.4 release. If
quick consensus can be reached I'm willing to do the work with a window
of time I have available over the next 2-3 weeks.

[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
[2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
[3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
[4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1


--
Chad La Joie
http://itumi.biz
trusted identities, delivered

Reply via email to