Re: [Xerces-2]: re:schema parsing design

Kohsuke KAWAGUCHI Mon, 13 Aug 2001 16:52:36 -0700


> > Since I am a bad implementor and I don't know how I can implement such
> >an "advanced" feature, I would reject such a "missing component" as
> > an error. But I think at least Henry's XSV allows such things.
> 
> Evidently we're bad implementors too because we'd also reject it today.  :
> -)  Actually, I think "reject" or "accept" is too coarse a distinction:
> this is where assessment outcomes of "unknown" and other such fun PSVI
> concepts come in.  We're definitely going to build enough info into our
> grammars so that we can account for the PSVI; but I'm not all that
> confident that once we determine something to be unknown we'll be able to
> fill it in later.  It would be an interesting feature though.

I used the word "reject" in the sense that I would make my software refuse
to validate any documents with that schema. The same thing it does when
it sees a not-wellformed schema.

But it sounds like Xerces does allow users to validate documents with
that broken schema. Am I correct?  It's just that the assessment outcome
will not become "full".

I personally don't think such "late-binding" is necessary. It just makes
things complicated, and serves no purpose.



> > When you see a start tag, you use
> 
> > 1. the current content model (or the state in the current content model?)
> > 2. tag name (uri,local)
> 
> > to decide the content model. Right?
> 
> That's what we're planning to do in X2.  In x1 we have this nasty concept
> of "scope" so things are a bit different.

Would you tell me whether you are planning to use (1) the current
content model or (2) the current state in the current content model?
This distinction is important for me.

My guess is that you have to use the current state, because sometimes
the content model by itself is not enough to decide the content model
for children.

<xs:complexType name="crap">
  <xs:sequence>
    <xs:element name="foo" type="aType"/>
    <xs:element name="foo" type="anotherType"/>
  </xs:sequence>
</xs:complexType>

I'm not confident but I guess the above one is a valid XML Schema. To
decide the type of the content model, you need to see the current state.



> > Now depending on the value of "foo", the parent has to change its
> > behavior.
> 
> Yeah that is wild!  So you've written a RELAX NG validator; how do you do
> this?  I'm also curious about how efficient validating with this kind of
> grammar is; it looks like it would be even less efficient than XML schema
> validation...

If you are living in the bay area or have a chance to visit there,
please let me know and I am happy to explain everything in detail. It's
actually not that difficult.

Or you want me to try now?


As for performance, I think I can say it's not that inefficient as you
might think. But I don't have any hard evidence to support my claim.

Or I would say such a performance penalty is acceptable. If you want to
drive a BMW and not a Honda Civic, you have to pay more.



> But I'm not sure how our current system would account for a content model
> like this.  I guess we could try and plug some sort of NFA-ish content
> model into our scheme; but I'd hate to think about the implementation or
> performance characteristics of an approach like that...

I really want to see the revised interfaces. I think I can suggest a
thing or two to make it RELAX NG compatible. I'd appreciate if you would
send them to me.

I've attached the javadoc document of the Acceptor interface, which I'm
using as a surface interface of the grammar. You can think of an
acceptor as a small validator that validates child elements of one element.

You would see the stepForward method that takes a child acceptor. Within
this method, the parent receives the outcome of the validation of the
child element. 

Is it possible to add this kind of interaction to the X2 validation
engine?

It looks like that X2's ContentModelValidator is the closest equivalent
of the Acceptor, but ContentModelValidator seems to be stateless.




> > a little more complicated things like "any tag name except xyz:abc" or
> >"xyz:*** or abc:def".
> 
> This might be ugly but it's not all that hard to see how it could be
> grafted on.  What sort of approach did you take to constructs like this?

The easiest way is to implement an interface like this:

interface NameClass {
  boolean accepts( String namespaceURI, String localName );
}

Then you can derive a various primitives and you can also have
combinators like "X or Y"(choice) and "X but not Y"(difference).
That's how I implemented it.


But I feel that you are asking about how you can convert a name into an
integer index (it looks like Grammar class uses integer index for
everything). That can be done by constant time.


private static string WILDCARD = "*";

public int getElementDeclIndex(QName elementDeclQName, int scope) {
    int mapping = fScopeMapping.get(
        scope, elementDeclQName.localpart, elementDeclQName.uri );
    if(mapping==-1)
        mapping = fScopeMapping.get(
            scope, WILDCARD, elementDeclQName.uri );
    if(mapping==-1)
        mapping = fScopeMapping.get( scope, WILDCARD, WILDCARD );
    
    return mapping;
}


But I wonder why Grammar class uses integer as an index. You can easily
have an ElementDeclaration object or something and get rid of integer
index and scope. That would make things much easier.




regards,
--
Kohsuke KAWAGUCHI                          +1 650 786 0721
Sun Microsystems                   [EMAIL PROTECTED]

Acceptor.zip

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Xerces-2]: re:schema parsing design

Reply via email to