Hi folks,
Now that we've got a framework for grammar caching, it seem slike a
pretty good time to start a discussion of what our default
implementation should look like. We'll also need to make sure that all
the components we want to use the grammar caching framework actually
do; I'm thinking especially of the DTDValidator here, since we
currently have no way of caching DTD grammars.
Since we've got a couple of fairly significant bugs that we know of in
Xerces 2.0.0, it would be nice to put out a refresh fairly soon. So
I'm hoping that we can keep both our grammar-caching requirements--and
our discussion of them :-)--as brief as possible.
As a starting-point, here's my idea of the functionality a default
grammar caching implementation should have:
1. It should be thread-safe;
2. It should be as easy to use from SAX, DOM or XNI ior even JAXP as possible;
3. It should encompass both XML Schemas and DTD's;
4. It should permit grammars to be preparsed or cached as they are
encountered while validating instance documents;
5. It should permit the application to "lock" the cache, that is,
prevent any more grammars from being added.
Now obviously, if a user is interested only in 2., then 4. and 5. don't
apply (since there's no concept of grammar preparsing in SAX, XNI or
JAXP). So we won't be able to satisfy everyone all the time--but then
nothing new there. :-)
So here's the sort of implementation I have in mind:
- We need an XMLGrammarConfiguration which subclasses
StandardParserConfiguration. This class has:
- a static XMLGrammarPoolImpl and a SymbolTable
- a no-arg constructor which passes these into the
StandardParserConfiguration that it extends
- a method (similar to the DOM 3
DOMASBuilder#parseASURI(...)
(except perhaps with clearer semantics)
- a method to stop XMLGrammarPoolImpl from receiving any more
grammars
Under this regime, it'll be possible to access Xerces2's grammar
caching functionality even through JAXP, using Andy's
configuration-selection logic (this is why we need a no-arg
constructor on this class). If one does this, every time a SAXParser
or DOMParser is manufactured, it will share the same Grammar cache as
all the others.
On the other hand, users who want a bit of added functionality can
instantiate this configuration directly. This way they'll be able to
access the preparsing and locking functionality--although this will
require using a Xerces-specific implementation, thus stepping away
from standard API's. Such a user would also be able to extend this
configuration, perhaps by having a non-threadsafe implementation of
XMLGrammarPool or other custom enhancements.
I envisage XMLGrammarPoolImpl as being a very simple collection of a
couple of hashtables for DTD's and schemas. They could probably hash
directly on the XMLGrammarDescriptions for these types of grammars.
I suspect it's not realistic to permit DTD preparsing yet. But I do
think it should be relatively straightforward to induce the
XMLDTDValidator to co-operate with our grammar caching scheme--I'd
certainly invite feedback on this point though.
I looked over the CachingParserPool that we currently have (has anyone
used it?) but I can't see any way of adapting this sort of approach to
suit users who don't want to bind themselves closely to our
implementation. Thoughts?
Have I missed anything? Do folks think this will work? Even more
important, since I won't be around all that much this month--I've got
some vacation time coming up and some other commitments--anyone want
to help?
Any suggestions on how the work might be broken down?
Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone: 905-413-3519, T/L 969-3519
E-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]