Re: [Xerces2] How do we want Grammar Caching

sandygao 21 Aug 2001 19:26:42 -0000

Hi. I've spent lots of time thinking about grammar caching, so it's easy
for me to think/expect that everyone knows the meaning of every weird word
in my mind :-)


Now I'll try to clarify something that was not clear in my earlier mail. In
fact, Curt Arnold had an interesting reply, and I've taken some effort
expressing my idea in the message replying his.

Basically, when the parser validates a document instance, there are two
sets of grammars. The first set is store inside the parser (we can call it
the local set), which contains grammars available to the current parsing
(the current document); while the second set is under the application's
control, used to store cached grammars (we can call it the cached set). The
second set is not mandatory, because some applications might not be
interested in grammar caching at all.

The reason to have to separate sets:
- The application might not want all the grammars in the pool to be
available to each instance, so there are grammars in the cached set, but
not the local one.
- The application might not want to cache all compiled grammars, so there
are grammars in the local set, but not the cached one.

So a grammar "known to the parser" means that this grammar is in the local
set: stored inside the parser. And the relation between my notation and
yours is:
GrammarPool: a collection of cached grammars. It's behind what you called
GrammarCache, and my GrammarResolver.
GrammarResolver: your GrammarCache+GrammarResolver. The only difference is
that I have one interface, which you have two. It's used to either return a
cached grammar, or override a grammar location.

> Parser asks for *all* grammars
> at first GrammarCache does not it? It is not responsibitity of parser to
> know if grammar is already cached or not.

I don't think we should *always* ask the application for grammars. Please
refer to Curt's message, and my reply.

> I am for a third approach. Create new interface GrammarResolver that is
> similar to EntityResolver but it is used only for interpreting the
> schemaLocation attribute. It is not responsibility of EntityResolver
> nor GrammarCache.

This is certainly a clean design, but I guess many Xerces1 users are used
to using EntityResolver to override grammar locations, and such change
would make them unhappy :-)

> > 3. There is still a problem for the design of schema caching. Assume
> > grammar A (that is, a grammar with target namespace A) is known to the
> > parser, then the parser asks the application for grammar B. The
application
> > returns grammar B, which imports a different grammar A. Now the two A's
> > conflict (we assumed a one-grammar-per-namespace rule). To avoid such
> > confliction, in the GrammarResolver interface, we ask the application
to
> > provide a different grammar B in this case (method grammarConflict()).
>
> I assume that cached grammar B just references grammar A. So
> parser can use its A copy.

Because the parser has no control over how cached grammars are stored, it's
possible that the application returns two different A's.

> Here I doubt if I understand it. Why parser does not interact
> with pool directly? I prefer to call it GrammarCache.

What I called grammar pool is a abstract concept. It refers to a collection
of grammars. The application can choose whatever way to manage it. While
the interface GrammarResolver/GrammarCache is used to communicate between
the parser and the application. So the parser only interacts with such
interface, not the physical collection of grammars.

> public interface GrammarCache {
>    // it can be implemented by a Map
>    public Grammar get(Key);
>    public Grammar put(Key, Grammar);
>    public interface Key {
>    }
> }
>
> public interface CacheKeyFactory {
>   //GrammarCache.Key createKey(String namespace);
>   //GrammarCache.Key createKey(String pID, String sID);
>   GrammarCache.Key createKey(String grammarType,
>                              String grammarKey,
>                              String hint);
> }
>
> public interface GrammarResolver {
>    // ask the application to override import
>    // schemaLocation.
>    public XMLInputSource resolveGrammarLocation(String grammarType,
>                                                 String grammarKey,
>                                                 String hint);
>    // ask the application to override include/redefine
>    // schemaLocation.
>    public XMLInputSource resolveGrammarLocation(String grammarType,
>                                                 String hint);
> }

We still need to decide whether to have one GrammarResolver interface, or
two interfaces GrammarCache/GrammarResolver. What do others think?

Other than that, your interfaces don't look too much different from my
GrammarResolver. But other methods in my GrammarResolver have their reason
to be there. Please refer to my message replying Curt.

> Time diagram from parser point of view:
>
>   $key = ask CacheKeyFactory for a key
>   $grm = ask GrammarCache for Grammar
>   if $grm == null {
>     $in ask GrammarResolver for InputSource
>     $grm = constructGrammar($in)
>     put $grm into cache under $key
>   }

I basically agree. But as the first step, we should try to look for the
grammar in the local grammar set. If the grammar is not there, we turn to
ask the application. And for the last step, instead of put the grammar into
cache, we should first store it in the local grammar set, then return the
grammar to the application. It's up to the application to decide whether a
grammar should be cached.

Cheers,
Sandy Gao
Software Developer, IBM Canada
(1-416) 448-3255
[EMAIL PROTECTED]



                                                                                
                                   
                    Petr Kuzel                                                  
                                   
                    <[EMAIL PROTECTED]       To:     
xerces-j-dev@xml.apache.org                                       
                    n.com>               cc:                                    
                                   
                    Sent by:             Subject:     Re: [Xerces2] How do we 
want Grammar Caching                 
                    [EMAIL PROTECTED]                                           
                                      
                    .com                                                        
                                   
                                                                                
                                   
                                                                                
                                   
                    08/21/2001                                                  
                                   
                    12:58 PM                                                    
                                   
                    Please respond                                              
                                   
                    to                                                          
                                   
                    xerces-j-dev                                                
                                   
                                                                                
                                   
                                                                                
                                   



Hi,

My comments use following terms:
GrammarCache allows to store known grammars.
GrammarResolver resolves grammar location (not in cache).

> For example, a grammar for namespace N is already known to the parser,
then
> it sees an element from N. Does the parser use that known grammar, or
does
> it ask the application for another one? Another example: A imports B and
C,
> then both B and C import D. Does the parser ask the application for D
twice
> (which might result in two different D's, and note that it would be an
> error if the two D's have element declarations of the same name)?

What is a grammar already known to parser? Parser asks for *all* grammars
at first GrammarCache does not it? It is not responsibitity of parser to
know if grammar is already cached or not.

> For an xsi:schemaLocation/noNamespaceSchemaLocation attribute,
> a. if a grammar for such namespace is already known to the parser, then
> that grammar is used;
> b. otherwise, the application will be given an opportunity to provide the
> grammar object for that namespace;
> c. if a grammar is not provided from (b), then the application can
override
> the schemaLocation hint.

It is a work for a GrammarResolver.

> I don't know DTD as much as folks do to come up with a good algorithm. So
I
> need help from you guys again. We might need to consider the following
> questions.
> a. How to cache internal DTD subset vs. external DTD subset;

>From my point of view it does not make sence to cache internal DTD as
it is per document. It makes sence to cache just external entities
referenced from it.

> b. What's the minimum unit of DTD grammar to cache? An internal subset?
An
> external subset (corresponding to a .DTD file)? Or a group of
> internal/external subsets?

I am advocating for an external entity indentified by public ID as
referenced by DOCTYPE. However some users may need other granularity.


> 2. Note that the two "resolveGrammarLocation" methods at the end seem
> overlapping with "resolveEntity" method of "EntityResolver": they are all
> used to override an location. And in Xerces1, we did use EntityResolver
to
> override schema location hints. So it causes confusion when both
> GrammarResolver and EntityResolver are specified: which one to use, or
> which one takes higher priority? To solve this, we can drop the two
> "resolveGrammarLocation" methods, and use the EntityResolver all the
time,
> but we are risking losing grammar type and, possibly, namespace
> information. We currently lean more to another solution: derive
> GrammarResolver from EntityResolver. So that you only need to set one
> resolver for such purpose, and we can clearly state which method(s) (the
> one from EntitiResolver or GrammarResolver) takes precedence. Any
comments
> on either of the two approaches?

I am for a third approach. Create new interface GrammarResolver that is
similar to EntityResolver but it is used only for interpreting the
schemaLocation attribute. It is not responsibility of EntityResolver
nor GrammarCache.

> 3. There is still a problem for the design of schema caching. Assume
> grammar A (that is, a grammar with target namespace A) is known to the
> parser, then the parser asks the application for grammar B. The
application
> returns grammar B, which imports a different grammar A. Now the two A's
> conflict (we assumed a one-grammar-per-namespace rule). To avoid such
> confliction, in the GrammarResolver interface, we ask the application to
> provide a different grammar B in this case (method grammarConflict()).

I assume that cached grammar B just references grammar A. So
parser can use its A copy.

> 4. The parser doesn't interact with a grammar pool directly. It's the job
> of the GrammarResolver to get grammars from the pool, and give them to
the

Here I doubt if I understand it. Why parser does not interact
with pool directly? I prefer to call it GrammarCache.

> parser. So the caching detail is separated from the validation process of
> the parser, and is under full control of the application.
>
> 5. Xerces2 will provide a default implementation of GrammarResolver,
which
> interacts with a default implementation of grammar pool:
> a. The pool is shared across the application;
> b. The pool is thread safe;
> c. Put every grammar in the pool;
> d. Always try to get grammars from the pool first.

e. pool uses WeakReferences or TimeoutedReferences.

> Is this default behavior satisfying? Of course, people can always
> contribute other general-purpose GrammarResolver/GrammarPool
> implementations.
>

public interface GrammarCache {

   // it can be implemented by a Map

   public Grammar get(Key);

   public Grammar put(Key, Grammar);

   public interface Key {
   }
}

public interface CacheKeyFactory {

  //GrammarCache.Key createKey(String namespace);

  //GrammarCache.Key createKey(String pID, String sID);

  GrammarCache.Key createKey(String grammarType,
                             String grammarKey,
                             String hint);


}

public interface GrammarResolver {

   // ask the application to override import
   // schemaLocation.
   public XMLInputSource resolveGrammarLocation(String grammarType,
                                                String grammarKey,
                                                String hint);

   // ask the application to override include/redefine
   // schemaLocation.
   public XMLInputSource resolveGrammarLocation(String grammarType,
                                                String hint);
}

Time diagram from parser point of view:

  $key = ask CacheKeyFactory for a key
  $grm = ask GrammarCache for Grammar
  if $grm == null {
    $in ask GrammarResolver for InputSource
    $grm = constructGrammar($in)
    put $grm into cache under $key
  }

> Did I miss anything?

May be I miss something because I commented so much.

  Cc.

--
<address>
<a href="mailto:[EMAIL PROTECTED]">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/";>Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/";>Jini</a> modules</address>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Xerces2] How do we want Grammar Caching

Reply via email to