Hi all,

To keep with the theme of picking up long-dropped threads, I thought
it would be useful if we talked about grammar caching a bit more.
We've done lots of coding, but there are a couple of things that need
cleaning up:

1.  Some people have expressed concerns, largely privately, about the
interface between the application and the grammar caching framework;
we need to decide just how this should look and get it done.

2.  A good job has been done with XML Schema; some scaffolding is
present, but little else has been done for DTD's.

In this message I'll focus on 1, with a view to summarizing
some of the off-list discussions that have occurred, and proposing a
way forward.  I'll send a subsequent message on point 2, since that
topic is entirely separate and I'm really hoping we can draw in some
DTD gurus--IMHO lack of DTD expertise is probably the biggest reason
we haven't had much progress on this front.

What we have now in terms of grammar caching is a bit confusing.  An
application can implement its own XMLGrammarPool, instantiate a
configuration with it, and happily cache grammars as they're
encountered.  Or, the application can use the
XMLGrammarCachingConfiguration with any parser, and again grammars
will be cached as they're encountered.  On the other hand, the
only way to preparse grammars is either to use
the DOMASBuilder functionality--both experimental and, arguably, not
entirely intuitive--or use the XMLGrammarCachingConfiguration, wihch
comes with a method to put grammars into the grammar pool.

It's been suggested off-list that grammar preparsing is central to
what grammar caching's about, and shouldn't be relegated to a special
configuration.  It's also been suggested that the exigencies of
different situations (DOM l3 grammar parsing vs. what we want to do by
default) make it imperative to develop not one implementation, but
several, used in different contexts.  Finally, the idea of treating
XMLGrammarPoolImpls more like entity resolvers--settable using
properties--than internal-use-only objects (c.f. the SymbolTable) has
been put forward.

To my mind, there are 4 categories of application with respect to
grammar caching:

(a) Don't (need to/want to) know about potential pitfalls; just want a
parser that works reliably.  The sort of application for which XML
isn't essential or performance isn't critical.  This kind would want a
null implementation.

(b) Want performance, but don't have knowledge/time to customize.
This would be the sort of application for which XML is critical--maybe
a SOAP server or some such; might be written to use a wide variety of
parsers, and wants to exploit abilities where possible without too
much development overhead.  These would be the sort that would want a
good default implementation, preferably easily available through
standard API's, and would not use preparsing.

(c) Performance is critical, and so is stability, but perhaps there's
not time to develop a custom XMLGrammarPoolImpl.  These would be the
sort of applications that would use grammar preparsing and locking.

(d) The super-application that wants everything its own way.

It's in order to accommodate applications of type (b) that I think
we're compelled to have a special configuration that provides access to a
decent default implementation of grammar caching through a no-arg
constructor.  These guys will want to use JAXP to instantiate their
parser, and setting a system property to direct our JAXP implementation
to use XMLGrammarCachingConfiguration would be the easiest, least-intrusive
way for this kind of application to get what it needs.  If all we
provided was a property then the app would have to test its parser to
see if it understood the property; this is doable but seems too much
overhead to me.

But supporting applications of type (c) and (d) is also critical.  I'd
just hate to see us optimize for this case when (c)- and (d)-type
developers are already likely to be familiar with--or at least
familiarizable with--xerces's guts to the extent that they'll be able
to handle some inconvenience.  So I tend to think preparsing should
stay on a specific configuration, not on StandardParserConfiguration.
But allowing an XMLGrammarPool to be set on a configuration via a property
just makes sense to my mind; if someone wants to do it, there's no reason
not to let them!

To be more concrete, here are my thoughts on how to stabilize things:

- XMLGrammarPool should be made into an external property.  This
  should be read-only within a parse, but if an application wants to
  change it between parses it would be caveat emptor.  It's
  conceivable this could be useful, so no point in preventing it.

- DOMASBuilderImpl should instantiate itself using
  XMLGrammarCachingConfiguration rather than DTDXSParserConfiguration,
  and XMLGrammarCachingConfiguration should extend
  DTDXSParserConfiguration rather than StandardParserConfiguration.

- Once that's done, DOMASBuilderImpl can be made to use
  XMLGrammarCachingConfiguration's grammar parsing functionality
  rather than doing its own thing; this should considerably increase
  its maintainability.

- Given DOM's requirements, it may be necessary to use a different
  grammar pool implementation here than XMLGrammarCachingConfiguration
  uses by default.  I hope this does not turn out to be the case
  though; it would be nice to be able to configure our single default
  implementation to give it the right characteristics.  If we're not
  able to do otherwise then perhaps it's not sufficiently flexible?

This is the best I can think of in terms of making life easiest for
users who need simplicity, while giving all kinds of power to
power-users.  If you're a user and you've read this far, chances are
you're pretty advanced; but I'd love to hear from everyone about
suggestions for how we can strike a better balance and if there's any
necessary functionality I've missed.

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to