Hi Kohsuke, Thanks for your note; lots of interesting questions here. > 1. Does SchemaHandler attempt to reuse the loaded schema documents? > In other words, say a schema document A imports Z and B also imports > Z, but A and B are unrelated each other. > > When you load A and B, is Z reused? That's what I would envision. But this all depends on how expressive our Grammar objects are, and just what functionality the Xerces2 GrammarPool will have; I think Sandy will have lots to say on this shortly. > If so, in what level will this > reuse be done? It's just a DOM tree which will be reused, or is it > parsed schema components that are reused? Once we're finished parsing a schema document the DOM trees it used will go away. So if we reuse it then it'll be parsed components that get reused. > 2. How about parsing .xsd files by using SAX? I guess there are > problems in using SAX, but what are the difficulties in doing so? Two that I can see: First, this approach would involve a total rewrite of schema support; it's not at all obvious to me that much of the X1 code could be ported over at all. Secondly, because components can reference each other with virtually no regard for order, there's just no way of avoiding some kind of in-memory representation; so it's not obvious that developing a custom representation for this single purpose is worthwhile. > 3. Do you have any plan to support other schema languages? I have RELAX > NG in my mind. No plans at the moment, but obviously this would change if people started to request this feature. Clearly we need the design to be sufficiently flexible to make this feasible though. > In another post, you wrote that "the separation between grammar > objects--the things which actually do validation--and their construction > from some human readable representation is sufficiently complete to > make this[support of other languages] feasible". > > I don't want to be pushy, but my experience suggests that things are > not that easy, mainly because the difference in the validation model. > > For example, W3C XML Schema allows you to use xsi:type to "switch" > the content model of particular element. Other schema language > treats those attributes as regular attributes. RELAX NG allows you to > write highly ambiguous content models, which makes it impossible to > use string automaton based algorithms. Now my knowledge of Relax NG comes only from perusing the excellent tutorial available from the Oasis site. But here's how I thought of overcoming this ambiguity problem: My understanding of the current approach is that we use an element name to identify an appropriate content model. I would think that, for a element with content models depending on the presence of given attributes--or of attributes with given values--we could have an additional layer, perhaps a Hashtable, to identify which content model to use. Perhaps a different ContentModel class would work for this, along the lines of the AllContentNodel that we use for elements containing schema <all> models currently. Since we're now validating on-the-way-in, I can't see why this wouldn't work. All this to say, I don't think it would be really all that difficult to extend our grammatical structure to handle this particular complexity of Relax NG. > And it also allows us to use > arbitrary datatype library. So long as the datatype library is compiled (i.e., Xerces knows about it) I don't think this poses any significant problems. Whatever library's in use will have to have a namespace and surely we can use this namespace to key its specific types of validator. Cheers, Neil --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
