Xerces C DOM parser wrapped as Java DOM. What I mean conventional equals()
should be preferred though == might have small performance gain.

Eric

On Fri, Aug 13, 2010 at 10:50 AM, Chad La Joie <laj...@itumi.biz> wrote:

> Which parser/DOM impl were you using?
>
>
> On 8/13/10 1:33 PM, eric fu wrote:
>
>> I encountered a problem before(version 1.4) caused by apache java code
>> which uses == for namespace comparison. In my own code, when adding DOM
>> node to a document, I have to create namespace using string from Apache
>> classes. That is, I cannot directly use
>> "http://www.w3.org/2000/09/xmldsig#"; as namespace String, instead I need
>> to use APACHE.BlashClass.DSIG_URI. The bug is not only hard to find, but
>> unnecessarily tie unrelated DOM code to XML security.
>>
>> Eric
>>
>> On Fri, Aug 13, 2010 at 6:02 AM, Colm O hEigeartaigh
>> <cohei...@apache.org <mailto:cohei...@apache.org>> wrote:
>>
>>    I would prefer if we stuck to the original plan of making sure "=="
>>    comparisons are only done for namespaces in a single piece of
>>    pluggable code. However, I think we should now revert to making the
>>    ".equals" comparison as the default for the next release, given that
>>    there is no compelling reason to do otherwise. Anyone who wants to
>>    experiment with getting a performance increase, can just plug the
>>    other piece of code in.
>>
>>    Thoughts?
>>
>>    Colm.
>>
>>
>>    On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie <laj...@itumi.biz
>>    <mailto:laj...@itumi.biz>> wrote:
>>     > I guess I didn't explicitly say this, but if, after a few days,
>>    people can't
>>     > suggest an issue with this testing methodology or provide testing
>>    inputs
>>     > that show different results, I'll rip out the helper class I
>>    added and just
>>     > use equals() everywhere.  That'll make the code a lot nicer to read.
>>     >
>>     > On 8/9/10 10:19 AM, Chad La Joie wrote:
>>     >>
>>     >> So, I have some unexpected results from this work.
>>     >>
>>     >> I implemented a helper class that checked the equality of
>>    element local
>>     >> names, attribute local names, namespace URIs, and namespace
>> prefixes
>>     >> (i.e. everything that Xerces always interns). Then I made sure to
>>     >> replace all == != and equals() that I could find with the
>>    appropriate
>>     >> call.
>>     >>
>>     >> To test, I picked the Canonicalizer20010315ExclusiveTest test
>>    case and
>>     >> made two alterations to the test22*excl methods:
>>     >> - do one c14n operation out the timing loop just to make sure
>>    all the
>>     >> classes are in memory, constants are loaded, etc.
>>     >> - in a 100 iteration loop, create a new canonicalizer,
>>    canonicalize a
>>     >> DOM tree, and time it using nanosecond time
>>     >>
>>     >> I did this for the example2_2_1.xml[1], example2_2_2.xml[2],
>> example
>>     >> 2_2_3.xml[3] input files (test221excl, test221excl, test223excl
>>     >> respectively).
>>     >>
>>     >> Here are the results, measured in nanosecond timing. "total"
>>    indicates
>>     >> the total time spent in all 100 runs, i.e. the summation of each
>>    of the
>>     >> 100 results.
>>     >>
>>     >> test221excl:
>>     >> equals() ==
>>     >> min 101000 99000
>>     >> max 123000 191000
>>     >> median 103000 105000
>>     >> avg 103760 106540
>>     >> total 10376000 10654000
>>     >>
>>     >> test222excl:
>>     >> equals() ==
>>     >> min 99000 101000
>>     >> max 192000 128000
>>     >> median 100000 108000
>>     >> avg 102110 108480
>>     >> total 10211000 10848000
>>     >>
>>     >> test223excl (an XPath nodeset canonicalization)
>>     >> equals() ==
>>     >> min 254000 248000
>>     >> max 290000 353000
>>     >> median 266000 265000
>>     >> avg 266820 265800
>>     >> total 26682000 26580000
>>     >>
>>     >> So, what these numbers appear to suggest is that, in fact,
>>    equals() is
>>     >> more often faster than ==. This seems counter-intuitive unless
>>    the JVM
>>     >> has specialized optimization for the String.equals() method.
>>     >>
>>     >> Can anyone see where my testing is likely to be flawed?
>>     >>
>>     >> [1]
>>     >>
>>     >>
>>
>> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup
>>    <
>> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup
>> >
>>     >>
>>     >> [2]
>>     >>
>>     >>
>>
>> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup
>>    <
>> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup
>> >
>>     >>
>>     >> [3]
>>     >>
>>     >>
>>
>> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup
>>    <
>> http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup
>> >
>>     >>
>>     >>
>>     >> On 8/2/10 10:11 AM, Chad La Joie wrote:
>>     >>>
>>     >>> So, while I don't have my access yet, Colm asked me if I'd take
>>    a look
>>     >>> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2],
>>    46681[3])
>>     >>>
>>     >>> My executive summary is that clearly, as things stand, the
>>    current code
>>     >>> favors optimization over correctness. Rarely is this a good thing.
>>     >>>
>>     >>> Colm notes[4] that the reliance on intern'ed strings (and thus the
>>     >>> ability to use ==) occurs sporadically throughout the code and
>>    not just
>>     >>> within the ElementChecker implementations. He specifically
>>    mentioned
>>     >>> that the various C14N implementations, and indeed the == is
>>    used about 6
>>     >>> times there for string comparison.
>>     >>>
>>     >>> My recommendation then is two fold:
>>     >>> - Ensure that nothing other than namespace bits are compared
>>    via ==. I
>>     >>> don't know that this occurs but the code should definitely be
>>    reviewed
>>     >>> to ensure that.
>>     >>>
>>     >>> - Create a new "NamespaceEqualityChecker" that provides methods
>> for
>>     >>> checking the various bits of a namespace (URIs, prefixes) and
>>    use it
>>     >>> anywhere that either == or equals() is used today.
>>    Implementations based
>>     >>> on == and equals() would be provided with the default
>>    implementation
>>     >>> being equals()-based. A configuration option should then be made
>>     >>> available to control which impl gets used. Additionally, it
>>    might even
>>     >>> be possible to add some smarts that could detect known "good"
>>    parsers
>>     >>> that use interning and automatically use the == based
>>    implementation.
>>     >>>
>>     >>> I do not recommend changing any part of the code without
>>    addressing the
>>     >>> whole codebase (i.e. all the =='s need to be fixed or no change
>>    should
>>     >>> be made) because of the possibility of creating new, unwanted,
>>    effects.
>>     >>> The current functionality is undesirable but better the devil
>>    you know.
>>     >>>
>>     >>> I think that this should be addressed in the upcoming 1.4.4
>>    release. If
>>     >>> quick consensus can be reached I'm willing to do the work with
>>    a window
>>     >>> of time I have available over the next 2-3 weeks.
>>     >>>
>>     >>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
>>     >>> [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
>>     >>> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
>>     >>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1
>>     >>
>>     >
>>     > --
>>     > Chad La Joie
>>     > http://itumi.biz
>>     > trusted identities, delivered
>>     >
>>
>>
>>
> --
> Chad La Joie
> http://itumi.biz
> trusted identities, delivered
>

Reply via email to