Re: [Xerces-2]: re:schema parsing design

neilg Thu, 16 Aug 2001 13:14:15 -0700
Hi Kohsuke,

Hope you saw Sandy's note in response to your posting; he's the real expert
on how we'll propose that X2 validation should work.

> I used the word "reject" in the sense that I would make my software
refuse
>to validate any documents with that schema. The same thing it does when
>it sees a not-wellformed schema.

> But it sounds like Xerces does allow users to validate documents with
>that broken schema. Am I correct?  It's just that the assessment outcome
>will not become "full".

That's what we want it to do/believe the schema specs intend for it to do.
So that's what we'll be striving for in X2.

> I personally don't think such "late-binding" is necessary. It just makes
>things complicated, and serves no purpose.

Roger Costello posted a message entitled "a plea for functionality" in June
where he advocated strongly for Xerces to support this.  I tend to agree
with you that it's a big cost for a small return, but if it didn't turn out
to be that hard to implement...

> Would you tell me whether you are planning to use (1) the current
> content model or (2) the current state in the current content model?

Like Sandy said, it's 2.

> If you are living in the bay area or have a chance to visit there,
> please let me know and I am happy to explain everything in detail.

I'd love to visit--never have been there--but my sig doesn't lie:  I do
indeed live in Toronto.  So I guess we'll have to do as best we can over
e-mail...

> It's actually not that difficult.

Nothing seems difficult that you yourself have written.  :-)

> Or you want me to try now?

Please!

> Or I would say such a performance penalty is acceptable. If you want to
> drive a BMW and not a Honda Civic, you have to pay more.

And to what would you analogize XML Schema:  a Ferrari?  :-)

> I really want to see the revised interfaces. I think I can suggest a
> thing or two to make it RELAX NG compatible. I'd appreciate if you would
> send them to me.

We'll certainly post them once they're available, but we haven't quite got
everything in order yet.  Stay tuned.

> I've attached the javadoc document of the Acceptor interface, which I'm
> using as a surface interface of the grammar. You can think of an
> acceptor as a small validator that validates child elements of one
element.

It does look like a very elegant way of keeping track of state.  We've
thought both of making ContentModels statefull and of getting the
Validator--the object which gets Document events and interacts with the
grammars--to keep track of states.  There's pros and cons to both
approaches.

> Is it possible to add this kind of interaction to the X2 validation
> engine?

So let me see if I have this right:  Under your scheme, there's an Acceptor
that corresponds to each type.  When you see a new element, you create (or
look up) the Acceptor for that type and send it away to do its work on that
element.  When it's done you plug it back in to the current Acceptor and
move on to the next child element at this level (or finish if there are no
more).  I think the Xerces custom has always been to have contentModels be
independent of each other; I'm uncertain both of how hard this would be to
change and whether change is necessary so long as state is preserved
somewhere.

> But I wonder why Grammar class uses integer as an index. You can easily
> have an ElementDeclaration object or something and get rid of integer
> index and scope. That would make things much easier.

As I've mentioned scope is definitely on our hit-list,  because it's
impossible to deduce the scope of an element when validating an instance
doc (as scope is defined in our grammar), and so to look up a declaration
can be a very tortuous procedure.

The use of indexes brings up some interesting questions though:  We're
currently debating whether to have more information in objects like X1's
XMLElementDecl, or to continue the practice of maintaining lots of parallel
arrays in the grammar, indexed with an ElementIndex, each corresponding to
a particular type of element information.  The former approach is more
object-oriented and elegant, but there's some feeling that the latter
should be more efficient--if a schema has 500 element decls that's 500
smallish objects, whereas perhaps you could get by with 10 or so large
objects in the current approach.

Lots more things to think about!

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  416-448-3519, T/L 778-3519
E-mail:  [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces-2]: re:schema parsing design

Reply via email to