Hi Chad, > My recommendation then is two fold: > - Ensure that nothing other than namespace bits are compared via ==. I > don't know that this occurs but the code should definitely be reviewed to > ensure that.
Agreed. > - Create a new "NamespaceEqualityChecker" that provides methods for > checking the various bits of a namespace (URIs, prefixes) and use it > anywhere that either == or equals() is used today. Implementations based on > == and equals() would be provided with the default implementation being > equals()-based. A configuration option should then be made available to > control which impl gets used. Sounds good. Have you thought about backwards compatibility issues though, in relation to the ElementCheckerImpl? We don't want to break backwards compatibility for a minor release if at all possible. > I think that this should be addressed in the upcoming 1.4.4 release. If > quick consensus can be reached I'm willing to do the work with a window of > time I have available over the next 2-3 weeks. That would be great! It would be interesting to run some benchmarks when you're done to compare both options. Colm. On Mon, Aug 2, 2010 at 3:11 PM, Chad La Joie <laj...@itumi.biz> wrote: > So, while I don't have my access yet, Colm asked me if I'd take a look at > the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3]) > > My executive summary is that clearly, as things stand, the current code > favors optimization over correctness. Rarely is this a good thing. > > Colm notes[4] that the reliance on intern'ed strings (and thus the ability > to use ==) occurs sporadically throughout the code and not just within the > ElementChecker implementations. He specifically mentioned that the various > C14N implementations, and indeed the == is used about 6 times there for > string comparison. > > My recommendation then is two fold: > - Ensure that nothing other than namespace bits are compared via ==. I > don't know that this occurs but the code should definitely be reviewed to > ensure that. > > - Create a new "NamespaceEqualityChecker" that provides methods for > checking the various bits of a namespace (URIs, prefixes) and use it > anywhere that either == or equals() is used today. Implementations based on > == and equals() would be provided with the default implementation being > equals()-based. A configuration option should then be made available to > control which impl gets used. Additionally, it might even be possible to > add some smarts that could detect known "good" parsers that use interning > and automatically use the == based implementation. > > I do not recommend changing any part of the code without addressing the > whole codebase (i.e. all the =='s need to be fixed or no change should be > made) because of the possibility of creating new, unwanted, effects. The > current functionality is undesirable but better the devil you know. > > I think that this should be addressed in the upcoming 1.4.4 release. If > quick consensus can be reached I'm willing to do the work with a window of > time I have available over the next 2-3 weeks. > > [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897 > [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637 > [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681 > [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1 > -- > Chad La Joie > http://itumi.biz > trusted identities, delivered >