Re: [Xerces-2]: re:schema parsing design

Kohsuke KAWAGUCHI Fri, 17 Aug 2001 17:00:27 -0700

> > I personally don't think such "late-binding" is necessary. It just makes
> >things complicated, and serves no purpose.
> 
> Roger Costello posted a message entitled "a plea for functionality" in June
> where he advocated strongly for Xerces to support this.  I tend to agree
> with you that it's a big cost for a small return, but if it didn't turn out
> to be that hard to implement...

I would even say that this feature is considered harmful. It makes it
impossible for tools to detect simple typos like:

<xs:element name="root">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="childd"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

A processor cannot issue an error because it may be a dangling reference
and the spec says it's not an error!  (sigh)



> So let me see if I have this right:  Under your scheme, there's an Acceptor
> that corresponds to each type.  When you see a new element, you create (or
> look up) the Acceptor for that type and send it away to do its work on that
> element.  When it's done you plug it back in to the current Acceptor and
> move on to the next child element at this level (or finish if there are no

Exactly.  Although I implemented it to create a new instance of Acceptor
for every element, actually there is no need to do it in such a way. You can
have a state array, just like Xerces does now.

I guess you'll say that creating an object for each element is
unacceptable in terms of performance.


> I think the Xerces custom has always been to have contentModels be
> independent of each other;

I can see that. I guess that is understandable.  If the content model
objects are kept state-less and state info is stored somewhere, I guess
there is no need for content models to interact (I'm not confident).

The essence is that to compute the new parent "state", it needs to know
the outcome of the child "state".


> I'm uncertain both of how hard this would be to
> change and whether change is necessary so long as state is preserved
> somewhere.

Yes, state info could be stored somewhere else. That's true.

However, it looks like the state is currently kept in the Validator
object itself.  Since the way states are computed is highly schema
language dependent, I suspect that it would be difficult to add RELAX NG
capability on top of the Validator class as it is.

It would be nice if there will be something (an abstraction layer?) to
hide state information.

Or maybe it boils down to the fact that the Grammar class and the
XMLValidator class are not that generic, despite its documentation. It
would be much easier if it is OK to build a RELAX NG (or Examplotron or
whatever) validation engine *without* using XMLValidator and Grammar. 
( ... or maybe I feel in this way only because I don't know XMLValidator
and Grammar well.)

What is the cons of the above approach? I suspect that extending Grammar
and XMLValidator is a requirement to be benefited from the grammar
caching, but is this correct? Or is there anything else?


> The use of indexes brings up some interesting questions though:  We're
> currently debating whether to have more information in objects like X1's
> XMLElementDecl, or to continue the practice of maintaining lots of parallel
> arrays in the grammar, indexed with an ElementIndex, each corresponding to
> a particular type of element information.  The former approach is more
> object-oriented and elegant, but there's some feeling that the latter
> should be more efficient--if a schema has 500 element decls that's 500
> smallish objects, whereas perhaps you could get by with 10 or so large
> objects in the current approach.

I have never been performance savvy, so I may be wrong, but I don't
think 500 objects matter. Hey, it's just 500, not 50000!  And how many
times does typical application load schemas?  Probably 1 or 2 (provided
that the cache is implemented). Not 100.  To me, the right solution to
the problem is the grammar cache.

And I guess using arrays and integer indices have its own cost. And lots
of those arrays do not make sense for other schema languages.

For example, RELAX NG notion of attributes are considerably different
from the representation the Grammar object uses.






regards,
--
Kohsuke KAWAGUCHI                          +1 650 786 0721
Sun Microsystems                   [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces-2]: re:schema parsing design

Reply via email to