Sounds fine to me. Colm.
On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joie <laj...@itumi.biz> wrote: > Okay, getting back to this. > > I tried my tests again this time with: > - a 7.5MB SAML metadata document (so lots of comparisons) > - 100 warm up runs then 100 timed runs > - an explicit GC between each run to keep it from happening during the runs > since the DOMs were so large > > No real difference in results. equals() was faster. > > So, at this point, I can't see any reason to do anything other than > equals(). It's the actual correct way of doing the comparison in that it > will always return the proper result and the JVM definitely seems to be > optimizing its use. > > On 8/10/10 7:53 AM, Chad La Joie wrote: >> >> Okay, I certainly have a number of SAML documents lying around so I'll >> try with those as well. And, of course, I'll report back the results I >> get. >> >> On 8/10/10 4:46 AM, Raul Benito wrote: >>> >>> As the original author of the changes of equals to == in intern >>> namespaces, I can tell that original in 1.4 and 1.5 and with my data >>> (that was the verification of a SAML/Liberty AuthnReq in a multi thread >>> tests, and the old Juice JCE provider). The change was 10% to 20% faster. >>> The SAML is one of the real example of signing and has some url with >>> common prefixes and same length url. >>> The Juice provider also helps to get rid of the signing/digest cost (a >>> verification is two c14n one of the signing part and c14n of the >>> signature), but i think just a c14n is a good way of measure it. >>> Also take into account that the == vs equals debate is more a memory >>> workload cache problem, if we have to iterate over and over every char >>> just to see if it is not equals, we trash the cache (That's why i used >>> the multi thread to simulate a server decoding requests with more or >>> less the same code, but in different times and different "workload") >>> Nevertheless if you have test with a more modern jre and the code >>> .equals is behaving better, just go ahead and kiss goodbye to the ==. >>> >>> Clive, using the .hashCode for strings in this case is not a big >>> speed-up as it is going to go through all the chars of the string, >>> trashing cache again, and multiplying and adding the result to an >>> integer, instead of a fail in the first different char or just summarize >>> to a boolean.\ >>> >>> Regards, >>> >>> >>> On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore >>> <xml...@brettingham-moore.net <mailto:xml...@brettingham-moore.net>> >>> wrote: >>> >>> Have to agree .equals is the way to go, since correctness of == is too >>> reliant on what must be considered implementation optimisations in the >>> parser. >>> >>> Benchmarking in JVM is notoriously difficult, but it does look like >>> there is no gross difference, which should kill any objections to doing >>> it correctly. >>> >>> Since I recently spend far to long researching this for an unrelated >>> problem I'll add my 10c to the detail discussion. >>> >>> On 10/08/10 01:23, Chad La Joie wrote: >>> >>> > Not necessarily, there are a number of not equal checks in there that >>> > should, in theory, perform better if you only use == only. In such a >>> > case, the use of != will just be a single check while !equals() will >>> > result in a char-by-char comparison. >>> >>> Actually, the next thing String.equals tests is length equality - so >>> character comparison will only be reached if the strings are the same >>> length. >>> >>> Since the char by char comparison returns on the first mismatch, then >>> only same length strings with shared prefixes will show the expected >>> slowness. (namespace URIs are likely to share prefixes, but I think are >>> not particularly likely to be the same length, unless actually equal)... >>> thus String.equals is only likely to be slow where comparing long >>> distinct but equal strings (so intern or alternative string pooling >>> techniques needed for == benefit .equals without all the nasty >>> loopholes: even if .equals is occasionally slow, at least it is always >>> right). >>> >>> In circumstances where doing repeated tests with many length and prefix >>> matches, adding a hash code inequality test ((s1.hashCode()== >>> s2.hashCode())&&s1.equals(s2)) could prevent practically all >>> char-by-char checks for !equal cases (but if the same strings are never >>> repeatedly used, the hash code calculation could be an issue; nb intern >>> results in hash calculation for all strings anyway)... pooling is still >>> needed to speed up matches for equality though. >>> >>> Re VM options I would feel -server is definitely the right test bed, >>> both because of the more aggressive JIT, and also because the code is >>> likely to see heaviest real world cases in -server VMs. >>> >>> >> > > -- > Chad La Joie > http://itumi.biz > trusted identities, delivered >