Re: Status of == vs equals() RESULTS

Raul Benito Mon, 09 Aug 2010 07:41:30 -0700

Hello Chad,
What command line options did you use?
My testings were more reliable if use 100 warms-up let the jit run its
magic, and then go for the timed test.
Also are you running both tests in the same invocation if you do, the second
will be handicap, as the first one will be just inline the second will have
a switch to see if it is one interface or the other.


Regards,

Raul


On Mon, Aug 9, 2010 at 4:19 PM, Chad La Joie <[email protected]> wrote:

> So, I have some unexpected results from this work.
>
> I implemented a helper class that checked the equality of element local
> names, attribute local names, namespace URIs, and namespace prefixes (i.e.
> everything that Xerces always interns).  Then I made sure to replace all ==
> != and equals() that I could find with the appropriate call.
>
> To test, I picked the Canonicalizer20010315ExclusiveTest test case and made
> two alterations to the test22*excl methods:
>  - do one c14n operation out the timing loop just to make sure all the
> classes are in memory, constants are loaded, etc.
>  - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM
> tree, and time it using nanosecond time
>
> I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example
> 2_2_3.xml[3] input files (test221excl, test221excl, test223excl
> respectively).
>
> Here are the results, measured in nanosecond timing.  "total" indicates the
> total time spent in all 100 runs, i.e. the summation of each of the 100
> results.
>
> test221excl:
>        equals()    ==
> min     101000     99000
> max     123000     191000
> median  103000     105000
> avg     103760     106540
> total   10376000   10654000
>
> test222excl:
>        equals()    ==
> min     99000      101000
> max     192000     128000
> median  100000     108000
> avg     102110     108480
> total   10211000   10848000
>
> test223excl (an XPath nodeset canonicalization)
>        equals()    ==
> min     254000     248000
> max     290000     353000
> median  266000     265000
> avg     266820     265800
> total   26682000   26580000
>
> So, what these numbers appear to suggest is that, in fact, equals() is more
> often faster than ==.  This seems counter-intuitive unless the JVM has
> specialized optimization for the String.equals() method.
>
> Can anyone see where my testing is likely to be flawed?
>
> [1]
> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup
> [2]
> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup
> [3]
> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup
>
> On 8/2/10 10:11 AM, Chad La Joie wrote:
>
>> So, while I don't have my access yet, Colm asked me if I'd take a look
>> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3])
>>
>> My executive summary is that clearly, as things stand, the current code
>> favors optimization over correctness. Rarely is this a good thing.
>>
>> Colm notes[4] that the reliance on intern'ed strings (and thus the
>> ability to use ==) occurs sporadically throughout the code and not just
>> within the ElementChecker implementations. He specifically mentioned
>> that the various C14N implementations, and indeed the == is used about 6
>> times there for string comparison.
>>
>> My recommendation then is two fold:
>> - Ensure that nothing other than namespace bits are compared via ==. I
>> don't know that this occurs but the code should definitely be reviewed
>> to ensure that.
>>
>> - Create a new "NamespaceEqualityChecker" that provides methods for
>> checking the various bits of a namespace (URIs, prefixes) and use it
>> anywhere that either == or equals() is used today. Implementations based
>> on == and equals() would be provided with the default implementation
>> being equals()-based. A configuration option should then be made
>> available to control which impl gets used. Additionally, it might even
>> be possible to add some smarts that could detect known "good" parsers
>> that use interning and automatically use the == based implementation.
>>
>> I do not recommend changing any part of the code without addressing the
>> whole codebase (i.e. all the =='s need to be fixed or no change should
>> be made) because of the possibility of creating new, unwanted, effects.
>> The current functionality is undesirable but better the devil you know.
>>
>> I think that this should be addressed in the upcoming 1.4.4 release. If
>> quick consensus can be reached I'm willing to do the work with a window
>> of time I have available over the next 2-3 weeks.
>>
>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
>> [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
>> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1
>>
>
> --
> Chad La Joie
> http://itumi.biz
> trusted identities, delivered
>

Re: Status of == vs equals() RESULTS

Reply via email to