> Hi Eric, > > >uncommonness of a feature's use and lack of useable > >support for it is kind of a chicken-and-egg / self-fulfilling prophesy > >kind of thing. > > I guess usefulness is in the eye of the user. :) I would argue that, when > your XML docs start getting into the megabytes, you've basically got a > small-time database on your hands. You should probably be using database > techniques/software to determine things like uniqueness of values; a basic > parser is inevitably not going to be as good at that.
Well my own code using either SAX parsing or DOM treewalking to do similar uniqueness checking is reasonably good at that, so I don't see why not. I lot of xml features will never be as good as the equivalent in some other technology, but that doesn't make them unvaluable as part of xml. As to this document being database-like, well, it is in a certain way--it's a taxonomy, and as such highly heirarchical and rather well suited for an XML database rather than a relational one for many purposes, though a relational representation has advantages for other purposed. We do actually use a relational model in the end application for storing this and doing certain more complex types of querying but XML is very useful for making this data persistable and easily portable--for management and editing via a standalone gui tool, etc. So what I'm doing is not unreasonable. Enforcing uniqueness constraints at this stage of our process (rather than later upon load to the relation representation), is very handy, and storing them in the data model as defined by the schema is very useful and more maintainable that keeping them in the applications. I suppose the philosophical issue of whether schema validation is the proper place for enforcing those kinds of constraints, rather than a layered approach using something like Schematron is an open one--if one concludes that it is not, then XML Schema's very support of these features is the problem. Admittedly the large size of it makes it a less-than-typical usage. However I'd argue that this is merely serving to bring to light a problem by magnification, and not the cause of the problem itself. Sorry to keep running on in support of this feature, and talk up my own use case, and those that my own experiences have told me are important ones regardless of common usage. But in my opinion / experience the usage of XmlSchema and especially it's more advanced features has been slow to take off because tool support for them has been very poor. I'm not inditing Xerces here--I wish the other tools I had to deal with had the same quality of schema support that Xerces does (really this constraint performance is the only issue with it I have). We've had to dumb down our usage of all sorts of schema features because of the lame schema support XMetal and other tools gives. So in my opinion, widespread usage of a lot of XmlSchema features will always lag support in Xerces and other parsers. But tool support has to come before common usage, and I think parser support has to come before tool support. I had further comments about the potential for turning your O(N!) to O(N) with hashing but your exchange with Joseph Kesselman has preempted that. : ^) Thanks, Eric > > >The almost > >equal slowness of the SAX parsing of this makes me wonder if both parsing > >methods are using the same xpath code, perhaps DOM-based xpath code, like > >that provided by Xalan > > Xerces schema validation code is written entirely independently of what > kind of parser happens to be using it. The xpath implementation is > stream-based; the schema spec does limit it enough that we don't need to > maintain any kind of tree structures. The code that does the xpath > processing are the Field, Selector, XPathMatcher and their inner classes in > the org/apache/xerces/impl/xs/identity package. > > >Your comment about "Xerces will take O(N^2) operations to prove it to > >itself" puzzles me though. Surely it doesn't iterate through the entire > >document again every time it finds a key node, to compare for dupes? > No; it iterates through all the other key values it's seen so far (from the > same constraint). > > >Surely > >it simply stores the keyvalue into a HashSet or something as it goes and > >checks for previous key existance as it goes, giving O(N) operations? > > It stores the previous keys in a vector, marches through and looks at each > one, adding the new key if it finds no dups. That's why I say there's > certainly room for improvement... The ValueStore inner classes of > XMLSchemaValidator are where the code lives for this. > > Catching all the edge cases in any reimplementation would be a challenge > though. I'm sure it could be done, but not hitting mainline code's > performance and maintaining spec-conformance (we're extremely conformant > with ID constraints right now) might be tricky. > > Let me know if you plan to tackle this and I'll help where I can. > Otherwise, I'll look into this some day, hopefully. :) > > Cheers, > Neil > Neil Graham > XML Parser Development > IBM Toronto Lab > Phone: 905-413-3519, T/L 969-3519 > E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
