Which parser/DOM impl were you using?
On 8/13/10 1:33 PM, eric fu wrote:
I encountered a problem before(version 1.4) caused by apache java code
which uses == for namespace comparison. In my own code, when adding DOM
node to a document, I have to create namespace using string from Apache
classes. That is, I cannot directly use
"http://www.w3.org/2000/09/xmldsig#" as namespace String, instead I need
to use APACHE.BlashClass.DSIG_URI. The bug is not only hard to find, but
unnecessarily tie unrelated DOM code to XML security.
Eric
On Fri, Aug 13, 2010 at 6:02 AM, Colm O hEigeartaigh
<cohei...@apache.org <mailto:cohei...@apache.org>> wrote:
I would prefer if we stuck to the original plan of making sure "=="
comparisons are only done for namespaces in a single piece of
pluggable code. However, I think we should now revert to making the
".equals" comparison as the default for the next release, given that
there is no compelling reason to do otherwise. Anyone who wants to
experiment with getting a performance increase, can just plug the
other piece of code in.
Thoughts?
Colm.
On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie <laj...@itumi.biz
<mailto:laj...@itumi.biz>> wrote:
> I guess I didn't explicitly say this, but if, after a few days,
people can't
> suggest an issue with this testing methodology or provide testing
inputs
> that show different results, I'll rip out the helper class I
added and just
> use equals() everywhere. That'll make the code a lot nicer to read.
>
> On 8/9/10 10:19 AM, Chad La Joie wrote:
>>
>> So, I have some unexpected results from this work.
>>
>> I implemented a helper class that checked the equality of
element local
>> names, attribute local names, namespace URIs, and namespace prefixes
>> (i.e. everything that Xerces always interns). Then I made sure to
>> replace all == != and equals() that I could find with the
appropriate
>> call.
>>
>> To test, I picked the Canonicalizer20010315ExclusiveTest test
case and
>> made two alterations to the test22*excl methods:
>> - do one c14n operation out the timing loop just to make sure
all the
>> classes are in memory, constants are loaded, etc.
>> - in a 100 iteration loop, create a new canonicalizer,
canonicalize a
>> DOM tree, and time it using nanosecond time
>>
>> I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example
>> 2_2_3.xml[3] input files (test221excl, test221excl, test223excl
>> respectively).
>>
>> Here are the results, measured in nanosecond timing. "total"
indicates
>> the total time spent in all 100 runs, i.e. the summation of each
of the
>> 100 results.
>>
>> test221excl:
>> equals() ==
>> min 101000 99000
>> max 123000 191000
>> median 103000 105000
>> avg 103760 106540
>> total 10376000 10654000
>>
>> test222excl:
>> equals() ==
>> min 99000 101000
>> max 192000 128000
>> median 100000 108000
>> avg 102110 108480
>> total 10211000 10848000
>>
>> test223excl (an XPath nodeset canonicalization)
>> equals() ==
>> min 254000 248000
>> max 290000 353000
>> median 266000 265000
>> avg 266820 265800
>> total 26682000 26580000
>>
>> So, what these numbers appear to suggest is that, in fact,
equals() is
>> more often faster than ==. This seems counter-intuitive unless
the JVM
>> has specialized optimization for the String.equals() method.
>>
>> Can anyone see where my testing is likely to be flawed?
>>
>> [1]
>>
>>
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup
<http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494&view=markup>
>>
>> [2]
>>
>>
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup
<http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494&view=markup>
>>
>> [3]
>>
>>
http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup
<http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915&view=markup>
>>
>>
>> On 8/2/10 10:11 AM, Chad La Joie wrote:
>>>
>>> So, while I don't have my access yet, Colm asked me if I'd take
a look
>>> at the == vs equals() issue (relevant bugs: 40897[1], 45637[2],
46681[3])
>>>
>>> My executive summary is that clearly, as things stand, the
current code
>>> favors optimization over correctness. Rarely is this a good thing.
>>>
>>> Colm notes[4] that the reliance on intern'ed strings (and thus the
>>> ability to use ==) occurs sporadically throughout the code and
not just
>>> within the ElementChecker implementations. He specifically
mentioned
>>> that the various C14N implementations, and indeed the == is
used about 6
>>> times there for string comparison.
>>>
>>> My recommendation then is two fold:
>>> - Ensure that nothing other than namespace bits are compared
via ==. I
>>> don't know that this occurs but the code should definitely be
reviewed
>>> to ensure that.
>>>
>>> - Create a new "NamespaceEqualityChecker" that provides methods for
>>> checking the various bits of a namespace (URIs, prefixes) and
use it
>>> anywhere that either == or equals() is used today.
Implementations based
>>> on == and equals() would be provided with the default
implementation
>>> being equals()-based. A configuration option should then be made
>>> available to control which impl gets used. Additionally, it
might even
>>> be possible to add some smarts that could detect known "good"
parsers
>>> that use interning and automatically use the == based
implementation.
>>>
>>> I do not recommend changing any part of the code without
addressing the
>>> whole codebase (i.e. all the =='s need to be fixed or no change
should
>>> be made) because of the possibility of creating new, unwanted,
effects.
>>> The current functionality is undesirable but better the devil
you know.
>>>
>>> I think that this should be addressed in the upcoming 1.4.4
release. If
>>> quick consensus can be reached I'm willing to do the work with
a window
>>> of time I have available over the next 2-3 weeks.
>>>
>>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897
>>> [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637
>>> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681
>>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1
>>
>
> --
> Chad La Joie
> http://itumi.biz
> trusted identities, delivered
>
--
Chad La Joie
http://itumi.biz
trusted identities, delivered