RE: Status of == vs equals() RESULTS

Pellerin, Clement Mon, 09 Aug 2010 08:13:58 -0700

In JDK 1.5, String.equals() begins with:

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        ...


Since String is a final class, the JIT compiler is free to in-line 
String.equals()
This is such a common case, I bet the JIT compiler team made it a special case 
to in-line at least the beginning of String.equals() at every invocation site.

If your test bed only uses intern Strings this will return early with the same 
behavior as == for equal strings.
Is it possible your test bed calls String.equals() with an overwhelming 
percentage of equal strings?

-----Original Message-----
From: Chad La Joie [mailto:laj...@itumi.biz] 
Sent: Monday, August 09, 2010 10:20 AM
To: security-dev@xml.apache.org
Subject: Re: Status of == vs equals() RESULTS

So, I have some unexpected results from this work.

I implemented a helper class that checked the equality of element local 
names, attribute local names, namespace URIs, and namespace prefixes 
(i.e. everything that Xerces always interns).  Then I made sure to 
replace all == != and equals() that I could find with the appropriate call.

To test, I picked the Canonicalizer20010315ExclusiveTest test case and 
made two alterations to the test22*excl methods:
   - do one c14n operation out the timing loop just to make sure all the 
classes are in memory, constants are loaded, etc.
   - in a 100 iteration loop, create a new canonicalizer, canonicalize a 
DOM tree, and time it using nanosecond time

I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 
2_2_3.xml[3] input files (test221excl, test221excl, test223excl 
respectively).

Here are the results, measured in nanosecond timing.  "total" indicates 
the total time spent in all 100 runs, i.e. the summation of each of the 
100 results.

test221excl:
         equals()    ==
min     101000     99000
max     123000     191000
median  103000     105000
avg     103760     106540
total   10376000   10654000

test222excl:
         equals()    ==
min     99000      101000
max     192000     128000
median  100000     108000
avg     102110     108480
total   10211000   10848000

test223excl (an XPath nodeset canonicalization)
         equals()    ==
min     254000     248000
max     290000     353000
median  266000     265000
avg     266820     265800
total   26682000   26580000

So, what these numbers appear to suggest is that, in fact, equals() is 
more often faster than ==.  This seems counter-intuitive unless the JVM 
has specialized optimization for the String.equals() method.

Can anyone see where my testing is likely to be flawed?

[1] 
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup
[2] 
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup
[3] 
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup

On 8/2/10 10:11 AM, Chad La Joie wrote:
> So, while I don't have my access yet, Colm asked me if I'd take a look
> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3])
>
> My executive summary is that clearly, as things stand, the current code
> favors optimization over correctness. Rarely is this a good thing.
>
> Colm notes[4] that the reliance on intern'ed strings (and thus the
> ability to use ==) occurs sporadically throughout the code and not just
> within the ElementChecker implementations. He specifically mentioned
> that the various C14N implementations, and indeed the == is used about 6
> times there for string comparison.
>
> My recommendation then is two fold:
> - Ensure that nothing other than namespace bits are compared via ==. I
> don't know that this occurs but the code should definitely be reviewed
> to ensure that.
>
> - Create a new "NamespaceEqualityChecker" that provides methods for
> checking the various bits of a namespace (URIs, prefixes) and use it
> anywhere that either == or equals() is used today. Implementations based
> on == and equals() would be provided with the default implementation
> being equals()-based. A configuration option should then be made
> available to control which impl gets used. Additionally, it might even
> be possible to add some smarts that could detect known "good" parsers
> that use interning and automatically use the == based implementation.
>
> I do not recommend changing any part of the code without addressing the
> whole codebase (i.e. all the =='s need to be fixed or no change should
> be made) because of the possibility of creating new, unwanted, effects.
> The current functionality is undesirable but better the devil you know.
>
> I think that this should be addressed in the upcoming 1.4.4 release. If
> quick consensus can be reached I'm willing to do the work with a window
> of time I have available over the next 2-3 weeks.
>
> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
> [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1

-- 
Chad La Joie
http://itumi.biz
trusted identities, delivered

RE: Status of == vs equals() RESULTS

Reply via email to