Sounds fine to me.

Colm.

On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joie <laj...@itumi.biz> wrote:
> Okay, getting back to this.
>
> I tried my tests again this time with:
>  - a 7.5MB SAML metadata document (so lots of comparisons)
>  - 100 warm up runs then 100 timed runs
>  - an explicit GC between each run to keep it from happening during the runs
> since the DOMs were so large
>
> No real difference in results. equals() was faster.
>
> So, at this point, I can't see any reason to do anything other than
> equals().  It's the actual correct way of doing the comparison in that it
> will always return the proper result and the JVM definitely seems to be
> optimizing its use.
>
> On 8/10/10 7:53 AM, Chad La Joie wrote:
>>
>> Okay, I certainly have a number of SAML documents lying around so I'll
>> try with those as well. And, of course, I'll report back the results I
>> get.
>>
>> On 8/10/10 4:46 AM, Raul Benito wrote:
>>>
>>> As the original author of the changes of equals to == in intern
>>> namespaces, I can tell that original in 1.4 and 1.5 and with my data
>>> (that was the verification of a SAML/Liberty AuthnReq in a multi thread
>>> tests, and the old Juice JCE provider). The change was 10% to 20% faster.
>>> The SAML is one of the real example of signing and has some url with
>>> common prefixes and same length url.
>>> The Juice provider also helps to get rid of the signing/digest cost (a
>>> verification is two c14n one of the signing part and c14n of the
>>> signature), but i think just a c14n is a good way of measure it.
>>> Also take into account that the == vs equals debate is more a memory
>>> workload cache problem, if we have to iterate over and over every char
>>> just to see if it is not equals, we trash the cache (That's why i used
>>> the multi thread to simulate a server decoding requests with more or
>>> less the same code, but in different times and different "workload")
>>> Nevertheless if you have test with a more modern jre and the code
>>> .equals is behaving better, just go ahead and kiss goodbye to the ==.
>>>
>>> Clive, using the .hashCode for strings in this case is not a big
>>> speed-up as it is going to go through all the chars of the string,
>>> trashing cache again, and multiplying and adding the result to an
>>> integer, instead of a fail in the first different char or just summarize
>>> to a boolean.\
>>>
>>> Regards,
>>>
>>>
>>> On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore
>>> <xml...@brettingham-moore.net <mailto:xml...@brettingham-moore.net>>
>>> wrote:
>>>
>>> Have to agree .equals is the way to go, since correctness of == is too
>>> reliant on what must be considered implementation optimisations in the
>>> parser.
>>>
>>> Benchmarking in JVM is notoriously difficult, but it does look like
>>> there is no gross difference, which should kill any objections to doing
>>> it correctly.
>>>
>>> Since I recently spend far to long researching this for an unrelated
>>> problem I'll add my 10c to the detail discussion.
>>>
>>> On 10/08/10 01:23, Chad La Joie wrote:
>>>
>>> > Not necessarily, there are a number of not equal checks in there that
>>> > should, in theory, perform better if you only use == only. In such a
>>> > case, the use of != will just be a single check while !equals() will
>>> > result in a char-by-char comparison.
>>>
>>> Actually, the next thing String.equals tests is length equality - so
>>> character comparison will only be reached if the strings are the same
>>> length.
>>>
>>> Since the char by char comparison returns on the first mismatch, then
>>> only same length strings with shared prefixes will show the expected
>>> slowness. (namespace URIs are likely to share prefixes, but I think are
>>> not particularly likely to be the same length, unless actually equal)...
>>> thus String.equals is only likely to be slow where comparing long
>>> distinct but equal strings (so intern or alternative string pooling
>>> techniques needed for == benefit .equals without all the nasty
>>> loopholes: even if .equals is occasionally slow, at least it is always
>>> right).
>>>
>>> In circumstances where doing repeated tests with many length and prefix
>>> matches, adding a hash code inequality test ((s1.hashCode()==
>>> s2.hashCode())&&s1.equals(s2)) could prevent practically all
>>> char-by-char checks for !equal cases (but if the same strings are never
>>> repeatedly used, the hash code calculation could be an issue; nb intern
>>> results in hash calculation for all strings anyway)... pooling is still
>>> needed to speed up matches for equality though.
>>>
>>> Re VM options I would feel -server is definitely the right test bed,
>>> both because of the more aggressive JIT, and also because the code is
>>> likely to see heaviest real world cases in -server VMs.
>>>
>>>
>>
>
> --
> Chad La Joie
> http://itumi.biz
> trusted identities, delivered
>

Reply via email to