Re: integrating grammar preparsing with the grammar caching API

Andy Clark Wed, 10 Apr 2002 08:11:21 -0700

Joseph Kesselman/CAM/Lotus wrote:
> For what it's worth, that's the impression this user got too... XNI's goal,
> as I understood it, we specifically to componentize the parser so those
> components -- and others -- could be assembled in whatever combination or


Yes, it is. But the general user who is only interested
in using DOM or SAX doesn't care (and shouldn't care) about
the XNI API. However, the advanced user will likely use it
to solve advanced problems -- things you just can't do with
normal XML parsers (like create your own pipelines, parse
other document formats as XML, etc).

> XNI should probalby still be considered experimental and subject to change.
> And there are probably some calls which are currently lumped under the XNI
> banner which _should_ be considered parser internals. But there needs to be

Which ones?

> (Personal pet peeve, which I've cited before: Incremental parsing. That's
> definitely an application-level issue, and the only way to access it right
> now is through XNI.)

True. We've been asked enough times for this that we should
definitely make it public. I avoided it until now because I
wasn't sure what we should do in the case that the parser
configuration used doesn't support pull parsing. But I think
we could simply throw an exception from those methods if the
underlying configuration doesn't support pull parsing.

And speaking of pull parsing, I have a comment and a question
for the Xerces-J developers out there. First, my comment: now
the incremental parser cannot guarantee that each call to the
parse method calls one and only one method to the registered
handlers. And in general this isn't possible anyway because
some events may be removed and/or synthesized downstream from
the scanner which we have no control over (e.g. think about
the notification of start/endPrefixMapping for namespaces).
However, I was thinking of writing a more "true" incremental
parser from Xerces by queueing events and only dispatching
the top (or is that bottom???) event in the queue. I figure
the only thing I would need to change to support this well is
to add a feature that tells the entity manager to NOT re-use
character buffers -- this would allow components (and apps)
to keep references to the character buffers, knowing that
their contents would never change. But I digress...

My question is related to the pull parsing JSR (StAX). There
are some big names involved in that work (e.g. James Clark)
and I am concerned that the completed API will be at odds
with the current design and implementation of Xerces. I'm
fine with them being different beasts because they are
designed to do different things. But I worry that people
will come to us and say "why don't you build the parser from
pull events?" when this doesn't make sense for us. The power 
of XNI and Xerces2 is that the components in the pipeline can 
be interchanged easily to create new configurations. Trying to
make this same thing possible as a pull parser is extremely
difficult, if not impossible (for some of the very reasons I
was alluding to previously). Again, I run at the mouth...

On to my question!

I am considering volunteering for the "Expert Group" of this
JSR to make sure that my concerns are taken into consideration
as this work continues. This would require me to sign an
agreement that is currently at odds with the Apache Foundation.
So my question is this: is this situation being resolved? Or
are we stuck in a limbo that keeps us from helping to guide
these specifications that could very well directly impact the
work that we are currently doing?

> Re grammar caching: Y'know, I'm _NOT_ convinced that there should be a
> single Grammar Cache for all grammars. This sounds like the sort of thing
> that ought to be a feature of the grammar engine rather than of Xerces as a
> whole.

I'm currently in favor of having a single grammar cache but
I'd like to hear more of your concerns. I think a single cache
would make it easier for sharing of pre-parsed grammars across
multiple parser instances, not just between multiple validator
components in the same parser pipeline.

-- 
Andy Clark * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: integrating grammar preparsing with the grammar caching API

Reply via email to