Re: integrating grammar preparsing with the grammar caching API

sandygao Fri, 05 Apr 2002 10:21:23 -0800

I sent out a message replying "Re: Stabilizing the Grammar Caching
framework" two weeks ago, but it didn't seem to reach the list. FWIW, I'm
attaching that message at the end of this one.


[Grammar Caching]

I was also worrying about introducing a new parser configuration to support
grammar caching. I agree that Neil has a good point: for JAXP users, if the
only thing they want is to cache whatever grammars the parser sees while
parsing instance documents, then it can easily be done using Neil's
approach: setting a system property, or modify the service file, to specify
a parser configuration that supports grammar caching. But the question is:
what kind of user would use the parser this way?

He uses JAXP, and is not sure which parser is used under the cover. But he
happens to know Xerces well enough so that he knows a special parser
*configuration* (a weird word to non-Xerces2-advanced users) can do some
magic work. So he sets the system property, and hopes the parser being used
is Xerces. Would this be a typical user?

And even for this kind of users, the same effect can be achieved with a few
lines of code:
  try {
    Class poolClass = Class.forName
("org.apache.xerces.impl.validation.XMLGrammarPoolImpl");
    Object grammarPool = poolClass.newInstance();
    parser.setAttribute(GRAMMAR_POOL_PROPERTY, grammarPool);
  } catch (Exception e) {
  }


So I tend to think that we don't need a new parser configuration for
grammar caching. Making the grammar pool property public is good enough.
This can keep our design simple.


[Grammar Preparsing]

Andy mentioned two ways: adding such method(s) to a parser configuration,
or add them to Grammar class(es).

The second approach seems pretty clean: after all, grammar preparsing is a
separate task from parsing instance documents. But it might be really
difficult to implement. The problem is that so many things need to be
available when parsing a grammar:
- The input: a URI, or an XMLInputSource
- The grammar pool: where cached grammar can be found, and parsed grammar
can be stored
- The error reporter: report errors in the grammars
- The entity resolver: resolve grammar locations
- The symbol table: to enable comparing String's using reference comparison
- Other handlers: for DTD, DTDHandler needs to be called for declarations
- Other features/properties: for schema, there are 2 location properties,
and 1 full-checking feature.

So I believe we have to go with the first approach, because the above
information is already available in parser configurations.

The next question is: which configuration? A new configuration? I always
thought it's not necessary. Why can't we have a parseGrammar method on the
standard configuration? The standard configuration already parses grammars.
Now we just expose this functionality by adding a new method.

We can have a GrammarParser component in the configuration, and make it an
internal property. When parseGrammar() is called on the configuration, it
queries the property, and calls proper method to parse the grammar.
Validators can also use this property to parse grammars, instead of calling
XSDHandler etc. directly.

Cheers,
Sandy Gao
Software Developer, IBM Canada
(1-905) 413-3255
[EMAIL PROTECTED]


*** Re: Stabilizing the Grammar Caching framework ***

Hi there,

There are many possible ways to provide grammar caching/preparsing
functionality: through a new parser configuration, or a new xerces
property, or both. Whichever way we go, the result (after setting up the
parser) is that the parser will have a "grammar pool", and a "parseGrammar"
method on it. The former is used to interact with the application to
retrieve cached grammars, and the latter is for the application to parse
grammar sources to xerces grammars. So the question Neil and I are trying
to ask/answer is: *how* to set up a parser so that it has these two things.

The first thing we need to decide is whether "grammar pool" and "preparse"
should always be bound together. That is, whether they should be supported
by the same parser configuration. I tend to think so. After all, the
purpose of preparsing a grammar is mainly to set it to a grammar pool (to
validate an instance), and the grammars in the grammar pool naturally come
from preparsing.

If we agree on this, then there are two possible ways:

1. Support both on the standard parser configuration, so the
StarndardParseConfiguration recognizes a "grammar pool" property, and it
has a "parseGrammar()" method.

Since it's standard, you'll always get it by default. So what the user
needs to do is to implement XMLGrammarPool interface, and set it via the
grammar pool property. (Or instantiate an instance of a sample grammar pool
implementation provided by Xerces.)

2. (Neil's suggestion) Define a new configuration (say
GrammarCachingConfiguration), which supports "grammar pool" and
"parseGrammar()".

So when either grammar caching or grammar preparsing is wanted, the
application needs to use this new configuration. This can be done either by
instantiating an instance of this configuration directly (Xerces-specific),
or set a system property to indicate which configuration to use
(Xerces-independent). Of course, the application can still set a grammar
pool via the property.


Note that if we go with 1, no grammar caching should be performed by
default. But if it's 2, we *can* provide a default grammar caching
implementation when GrammarCachingConfiguration is used, because the user
clearly wants to do something with grammar caching.

The debate of choosing the right one depends on the following questions:

[It's possible to write Xerces-independent grammar caching code using (2).
How much does it mean?]

For certain users (category 'b' in Neil's message), they want to cache
every grammar they see, without any grammar preparsing. Using 1, they need
to set a Xerces property, which makes the code Xerces-dependent. As Neil
said, solution 2 definitely makes their life easier: set a system property,
and you are set. But

a. How many such users are there? (This I can't answer. Andy seems to think
there aren't so many.)

b. If it's a big issue to be Xerces-independent, then for every Xerces
feature/property, we have the same problem. If we go extreme, shouldn't
they all become system properties?

[Is the grammar caching functionality part of a standard parser?]

The standard parser is used to parse xml documents. If we think
"parseGrammar" is a different kind of "parse" (parallel to the parse()
methods), then maybe it shouldn't be on the standard parser. But if we view
it as a setup procedure, it sounds OK: normally we prepare the grammars by
preparsing them (once), then validate many instances using them.

[Should we always provide a default grammar caching implementation when a
grammar is preparsed?]

When a grammar is preparsed, do we want the resulted grammar(s) to be
automatically stored in the parser (by default)? Or do we want such
grammars to be returned from parseGrammar() method, then the application
can choose to either set them back onto the same parser, or to other
parsers, or use them in other ways?


Note that I'm only talking about the XNI level. We'll have problems
(similar to what we have for the pull-parsing functionality) when we go to
the higher level API's (DOM/SAX/JAXP). Now we have a parseGrammar() method
on some Xerces parser configuration. How to invoke it from the other API's?
For DOM(3), it defines a DOMASBuilder interface, with parseAS() methods.
But for SAX/JAXP, the users have to somehow get a handler to the parser
configuration. Would this be a problem?

Cheers,
Sandy Gao
Software Developer, IBM Canada
(1-905) 413-3255
[EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: integrating grammar preparsing with the grammar caching API

Reply via email to