Re: [Xerces2] How do we want Grammar Caching

Andy Clark Thu, 09 Aug 2001 19:59:55 -0700
Petr Kuzel wrote:
> > - How do we access grammars in the grammar pool. For schemas, it might be
> > easier: we can use the target namespace. How about DTDs and schemas without
> > target namespace?
> 
> As a key I propose: target namespace and public ID of DTD grammar, generally
> a String URI.

Target namespace is easy for Schemas but I don't know about the
publicId of the the DTD grammar. First of all, publicIds are not
URIs so we wouldn't be consistent in our use of keys. Second,
there may not even be a DOCTYPE line that contains a publicId
or even systemId to use.

We have the following very common situations: the document 
contains a reference to the grammar (DOCTYPE for DTDs and
xsi:schemaLocation for Schema); or the document does *not* 
contain any reference to the grammar but validation should 
still be performed. And in the case where there is a grammar
specified, the application may want to enforce a particular
grammar so that the document instance cannot "spoof" what
grammar it validates against.

So the way I see it at the moment is that there are two
ways to separate grammars: by root element name and by
target namespace. And my thinking can be seen in the
interface for the GrammarPool that I put into Xerces2. 
There are two methods for querying grammars:

  getGrammar(String rootElementName):Grammar
  getGrammarNS(String targetNamespace):Grammar

However, some comments Sandy made to me off of the 
mailing list got me to thinking more about this issue.
The question I kept asking myself was this: "Is there
ever a time when a Schema validator would use a DTD
grammar in order to validate?" In other words, is there
any time when a specific type of grammar is used by any
kind of validator component? I don't think so.

Consider what we want to do within Xerces2, breaking
apart the "universal" validator into its separate
components so that the pipeline becomes the following:

  scanner -> DTD validator -> namespace binder -> Schema validator

In this way, I think the code becomes simpler and we
have more flexibility in arranging the configurations
that we want. But the DTD validator only needs DTD
grammars and the Schema validator only needs Schema
grammars. And it logically follows that a RelaxNG
validator would only need RelaxNG grammars.

But we'd still probably want a convenient way to pool
all of these types of grammars (and more). So Sandy
had the idea of using a grammar type as a way to 
query the appropriate grammar. It's analogous to a
MIME type but for grammar types that are put into the
pool. So perhaps our resolver/pool would have methods
like so:

  getGrammar(String type, String rootElementName):Grammar
  getGrammarNS(String type, String targetNamespace):Grammar

>From experience, I am now more in favor of NOT trying
to develop a "universal" grammar and validator where
the grammar contains the union of all of the features
available in *all* of the various types of grammar
languages. This makes it extremely complicated and
doesn't work really well in practice.

However, I still think that we can have a common base
class and even strive to share a certain amount of
code between the DTD and Schema validation engines
that are part of Xerces2. But separate is better, IMHO.

Just my two yen...

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] How do we want Grammar Caching

Reply via email to