Re: [Xerces2] How do we want Grammar Caching

Petr Kuzel Tue, 21 Aug 2001 09:35:43 -0700
Hi,

My comments use following terms:
GrammarCache allows to store known grammars. 
GrammarResolver resolves grammar location (not in cache).

> For example, a grammar for namespace N is already known to the parser, then
> it sees an element from N. Does the parser use that known grammar, or does
> it ask the application for another one? Another example: A imports B and C,
> then both B and C import D. Does the parser ask the application for D twice
> (which might result in two different D's, and note that it would be an
> error if the two D's have element declarations of the same name)?

What is a grammar already known to parser? Parser asks for *all* grammars
at first GrammarCache does not it? It is not responsibitity of parser to 
know if grammar is already cached or not. 
 
> For an xsi:schemaLocation/noNamespaceSchemaLocation attribute,
> a. if a grammar for such namespace is already known to the parser, then
> that grammar is used;
> b. otherwise, the application will be given an opportunity to provide the
> grammar object for that namespace;
> c. if a grammar is not provided from (b), then the application can override
> the schemaLocation hint.

It is a work for a GrammarResolver.
 
> I don't know DTD as much as folks do to come up with a good algorithm. So I
> need help from you guys again. We might need to consider the following
> questions.
> a. How to cache internal DTD subset vs. external DTD subset;

>From my point of view it does not make sence to cache internal DTD as
it is per document. It makes sence to cache just external entities
referenced from it.

> b. What's the minimum unit of DTD grammar to cache? An internal subset? An
> external subset (corresponding to a .DTD file)? Or a group of
> internal/external subsets?

I am advocating for an external entity indentified by public ID as
referenced by DOCTYPE. However some users may need other granularity.


> 2. Note that the two "resolveGrammarLocation" methods at the end seem
> overlapping with "resolveEntity" method of "EntityResolver": they are all
> used to override an location. And in Xerces1, we did use EntityResolver to
> override schema location hints. So it causes confusion when both
> GrammarResolver and EntityResolver are specified: which one to use, or
> which one takes higher priority? To solve this, we can drop the two
> "resolveGrammarLocation" methods, and use the EntityResolver all the time,
> but we are risking losing grammar type and, possibly, namespace
> information. We currently lean more to another solution: derive
> GrammarResolver from EntityResolver. So that you only need to set one
> resolver for such purpose, and we can clearly state which method(s) (the
> one from EntitiResolver or GrammarResolver) takes precedence. Any comments
> on either of the two approaches?

I am for a third approach. Create new interface GrammarResolver that is
similar to EntityResolver but it is used only for interpreting the
schemaLocation attribute. It is not responsibility of EntityResolver
nor GrammarCache.
 
> 3. There is still a problem for the design of schema caching. Assume
> grammar A (that is, a grammar with target namespace A) is known to the
> parser, then the parser asks the application for grammar B. The application
> returns grammar B, which imports a different grammar A. Now the two A's
> conflict (we assumed a one-grammar-per-namespace rule). To avoid such
> confliction, in the GrammarResolver interface, we ask the application to
> provide a different grammar B in this case (method grammarConflict()).

I assume that cached grammar B just references grammar A. So
parser can use its A copy.
 
> 4. The parser doesn't interact with a grammar pool directly. It's the job
> of the GrammarResolver to get grammars from the pool, and give them to the

Here I doubt if I understand it. Why parser does not interact
with pool directly? I prefer to call it GrammarCache.

> parser. So the caching detail is separated from the validation process of
> the parser, and is under full control of the application.
> 
> 5. Xerces2 will provide a default implementation of GrammarResolver, which
> interacts with a default implementation of grammar pool:
> a. The pool is shared across the application;
> b. The pool is thread safe;
> c. Put every grammar in the pool;
> d. Always try to get grammars from the pool first.

e. pool uses WeakReferences or TimeoutedReferences.
 
> Is this default behavior satisfying? Of course, people can always
> contribute other general-purpose GrammarResolver/GrammarPool
> implementations.
> 

public interface GrammarCache {

   // it can be implemented by a Map

   public Grammar get(Key);

   public Grammar put(Key, Grammar);

   public interface Key {
   }
}

public interface CacheKeyFactory {

  //GrammarCache.Key createKey(String namespace);

  //GrammarCache.Key createKey(String pID, String sID);

  GrammarCache.Key createKey(String grammarType,
                             String grammarKey,
                             String hint);


}

public interface GrammarResolver {
 
   // ask the application to override import
   // schemaLocation.
   public XMLInputSource resolveGrammarLocation(String grammarType,
                                                String grammarKey,
                                                String hint);

   // ask the application to override include/redefine
   // schemaLocation.
   public XMLInputSource resolveGrammarLocation(String grammarType,
                                                String hint);
}

Time diagram from parser point of view:

  $key = ask CacheKeyFactory for a key
  $grm = ask GrammarCache for Grammar
  if $grm == null {
    $in ask GrammarResolver for InputSource
    $grm = constructGrammar($in)
    put $grm into cache under $key
  }

> Did I miss anything?

May be I miss something because I commented so much.

  Cc.

-- 
<address>
<a href="mailto:[EMAIL PROTECTED]";>Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/";>Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/";>Jini</a> modules</address>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] How do we want Grammar Caching

Reply via email to