Re: [Xerces-2]: schema parsing design; discussion starter [long]

neilg Wed, 08 Aug 2001 07:29:59 -0700
Hi Kohsuke,

Thanks for your note; lots of interesting questions here.

> 1. Does SchemaHandler attempt to reuse the loaded schema documents?
>   In other words, say a schema document A imports Z and B also imports
>   Z, but A and B are unrelated each other.
>
>   When you load A and B, is Z reused?

That's what I would envision.  But this all depends on how expressive our
Grammar objects are, and just what functionality the Xerces2 GrammarPool
will have; I think Sandy will have lots to say on this shortly.

> If so, in what level will this
>   reuse be done?  It's just a DOM tree which will be reused, or is it
>   parsed schema components that are reused?

Once we're finished parsing a schema document the DOM trees it used will go
away.  So if we reuse it then it'll be parsed components that get reused.

> 2. How about parsing .xsd files by using SAX?  I guess there are
>   problems in using SAX, but what are the difficulties in doing so?

Two that I can see:  First, this approach would involve a total rewrite of
schema support; it's not at all obvious to me that much of the X1 code
could be ported over at all.  Secondly, because components can reference
each other with virtually no regard for order, there's just no way of
avoiding some kind of in-memory representation; so it's not obvious that
developing a custom representation for this single purpose is worthwhile.

> 3. Do you have any plan to support other schema languages? I have RELAX
>    NG in my mind.

No plans at the moment, but obviously this would change if people started
to request this feature.  Clearly we need the design to be sufficiently
flexible to make this feasible though.

>   In another post, you wrote that "the separation between grammar
>   objects--the things which actually do validation--and their
construction
>   from some human readable representation is sufficiently complete to
>   make this[support of other languages] feasible".
>
>      I don't want to be pushy, but my experience suggests that things are
>   not that easy, mainly because the difference in the validation model.
>
>   For example, W3C XML Schema allows you to use xsi:type to "switch"
>   the content model of particular element. Other schema language
>   treats those attributes as regular attributes. RELAX NG allows you to
>   write highly ambiguous content models, which makes it impossible to
>   use string automaton based algorithms.

Now my knowledge of Relax NG comes only from perusing the excellent
tutorial available from the Oasis site.  But here's how I thought of
overcoming this ambiguity problem:

My understanding of the current approach is that we use an element name to
identify an appropriate content model. I would think that, for a element
with content models depending on the presence of given attributes--or of
attributes with given values--we could have an additional layer, perhaps a
Hashtable, to identify which content model to use.  Perhaps a different
ContentModel class would work for this, along the lines of the
AllContentNodel that we use for elements containing schema <all> models
currently.  Since we're now validating on-the-way-in, I can't see why this
wouldn't work.

All this to say, I don't think it would be really all that difficult to
extend our grammatical structure to handle this particular complexity of
Relax NG.

> And it also allows us to use
>   arbitrary datatype library.

So long as the datatype library is compiled (i.e., Xerces knows about it) I
don't think this poses any significant problems.  Whatever library's in use
will have to have a namespace and surely we can use this namespace to key
its specific types of validator.

Cheers,
Neil


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces-2]: schema parsing design; discussion starter [long]

Reply via email to