Re: [Xerces2] How do we want Grammar Caching

Ted Leung Thu, 09 Aug 2001 09:24:20 -0700
+1 on Petr's comments.  The application needs control.  If necessary, we can
provide a hand-holding parser configuration for people who want that.  But I
know
that I need full control over the Cache.

Also, one comment below.

Ted
----- Original Message -----
From: "Petr Kuzel" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, August 09, 2001 3:05 AM
Subject: Re: [Xerces2] How do we want Grammar Caching


> Hi,
>
>   good timing, I was thinking about it last week:
>
> [EMAIL PROTECTED] wrote:
> > People are interested in grammar caching, and are asking questions about
> > it. But without knowing what people really want, we couldn't make any
> > decision on how to provide such functionality, and couldn't answer any
of
> > the questions. So, instead, allow me to ask this question first: how do
you
> > expect Xerces2 to provide grammar caching?
>
> I think that caching should by under application management. So I
> suggest a grammar cache property:
>
>   interface GrammarCache {
>
>      Object put(String URI_key, Object grammar);
>      Object get(String URI_key);
>
>   }
>
> so an application can in the simplest case put the property:
>
>   parser.setProperty(".../grammar-cache", new WeakHashMap());
>
> > Assuming we have a grammar pool in the system somewhere, and it contains
a
> > set of grammars that are already parsed (a set of objects of Grammar
> > class), let's consider the following scenarios. I'll use schema for the
> > examples, and use "grammar A" as a short for "a schema grammar of target
> > namespace A".
> >
> > 1. When we validate an instance, and see an element from namespace A. To
> > validate such element, we need to find grammar A. How do we find it?
> > (Assume grammar A is not already know to such instance).
> > a. Always look in the grammar pool first. If it's not there, parse the
> > grammar according to the schema location attributes.
> > b. Always ask the application. The application can get the grammar from
the
> > grammar pool, or from some other place. The application can also choose
to
> > override the schema location, and the parser will parse the grammar
> > accordingly.
> >
> > Which approach do you prefer? Or is there any other approach that serves
> > your needs better?
>
> The second. Only application can decide what to cache during its
lifecycle.
>
> > 2. When we are parsing grammar A, and it imports grammar B, we need to
find
> > grammar B. How do we find it? (Assume grammar B is not already know to
such
> > instance).
> > The same choices as those for (1).
> >
> > This is slightly different from the one above. Some application might
> > choose different approach for the two cases. One never knows.
> >
> > 3. After the parser parses a grammar, how will this grammar be put into
the
> > grammar pool?
> > a. The parser put the grammar into the pool automatically.
> > b. The parser just return the grammar to the application, and it's up to
> > the application to decide whether to put such grammar into the pool.
> >
> > Again, which approach is preferred?
> >
> > 4. How many grammar pools should there be? And how complicated should
the
> > grammar pool(s) be?
> > a. One pool for each application, and it can be as simple as a
hashtable.
> > b. One pool for each application, and it must be thread safe. That is,
the
> > grammar pool must be able to handle the case where two or more threads
try
> > to get/put grammars (possibly of the same namespace) at the same time.
> > c. One pool for each thread, and it can be as simple as a hashtable.
> > d. Dynamic numbers of grammar pools. The application can create as many
as
> > it wants, and tell the parser which one to use at a certain occasion.

d.  I assume here you mean each parser instance when you say "the parser"

> > I can come up with two extreme solutions here. Any approach in between
> > could be what's in Xerces2.
> >
> > [1] A clean design with less flexibility
> >
> > Xerces provides a Grammar pool, which is shared across the application.
> > This grammar pool is thread-safe. The parser gets/puts grammars
into/from
> > the grammar pool automatically. It's like we choose "a a a b" for the
above
> > four questions. This should be sufficient for many user cases, but the
> > applications won't be able to control how the grammar pool is accessed.
> >
> > [2] A flexible design
> >
> > Extreme flexibility means we don't assume anything, hence we couldn't
> > implement the grammar pool (because any one implementation might not
> > fulfill some specific case). So we leave the implementation of the
grammar
> > pool to the application, and the application can implement it in any way
it
> > wants: one or more pools, thread-safe or not. Each time an instance
> > document is parsed (or a standalone grammar is parsed), a list of
grammars
> > will be returned to (or accessed by) the application. The application
can
> > then decide which ones to cache. This is like we choose "b b b d" for
the
> > four questions.
> >
> > So please consider: Is [1] enough for our lives? Do we need the
flexibility
> > of [2]. Which point between [1] and [2] is most comfortable for us?
>
> I need the second. I want to manage what is garbage and what is not.
> It is not a reponsibility of parser!
>
> > There are other questions about the grammar pool:
> > - How do we access grammars in the grammar pool. For schemas, it might
be
> > easier: we can use the target namespace. How about DTDs and schemas
without
> > target namespace?
>
> As a key I propose: target namespace and public ID of DTD grammar,
generally
> a String URI.
>
> > - How do we deal with conflicting of grammars (for example, two schema
> > grammars with the same target namespace)?
>
> Let application decide according to context. It can cache two grammars for
> the same URI and return one of them depending on context.
>
> > But I guess we can answer them after we nail down what's really needed
for
> > grammar caching.
> >
> > I was trying to prepare a note to describe our thoughts about how we
were
> > going to support grammar caching, and some design/implement detail we
could
> > think of. But I found it really difficult to say anything before we know
> > what is really desired. And DOM3 is trying to provide its way to do
grammar
> > caching, which makes things even worse.
> >
> > Anyway, no decision has been made about any aspect of grammar caching.
So
> > make a wish! :-)
>
> Keep it simple and under full application control. :-)
>
>   Thanks
>   Cc.
>
> --
> <address>
> <a href="mailto:[EMAIL PROTECTED]";>Petr Kuzel</a>, Sun Microsystems
> : <a href="http://www.sun.com/forte/ffj/ie/";>Forte Tools</a>
> : XML and <a href="http://jini.netbeans.org/";>Jini</a> modules</address>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] How do we want Grammar Caching

Reply via email to