Hi all, To keep with the theme of picking up long-dropped threads, I thought it would be useful if we talked about grammar caching a bit more. We've done lots of coding, but there are a couple of things that need cleaning up:
1. Some people have expressed concerns, largely privately, about the interface between the application and the grammar caching framework; we need to decide just how this should look and get it done. 2. A good job has been done with XML Schema; some scaffolding is present, but little else has been done for DTD's. In this message I'll focus on 1, with a view to summarizing some of the off-list discussions that have occurred, and proposing a way forward. I'll send a subsequent message on point 2, since that topic is entirely separate and I'm really hoping we can draw in some DTD gurus--IMHO lack of DTD expertise is probably the biggest reason we haven't had much progress on this front. What we have now in terms of grammar caching is a bit confusing. An application can implement its own XMLGrammarPool, instantiate a configuration with it, and happily cache grammars as they're encountered. Or, the application can use the XMLGrammarCachingConfiguration with any parser, and again grammars will be cached as they're encountered. On the other hand, the only way to preparse grammars is either to use the DOMASBuilder functionality--both experimental and, arguably, not entirely intuitive--or use the XMLGrammarCachingConfiguration, wihch comes with a method to put grammars into the grammar pool. It's been suggested off-list that grammar preparsing is central to what grammar caching's about, and shouldn't be relegated to a special configuration. It's also been suggested that the exigencies of different situations (DOM l3 grammar parsing vs. what we want to do by default) make it imperative to develop not one implementation, but several, used in different contexts. Finally, the idea of treating XMLGrammarPoolImpls more like entity resolvers--settable using properties--than internal-use-only objects (c.f. the SymbolTable) has been put forward. To my mind, there are 4 categories of application with respect to grammar caching: (a) Don't (need to/want to) know about potential pitfalls; just want a parser that works reliably. The sort of application for which XML isn't essential or performance isn't critical. This kind would want a null implementation. (b) Want performance, but don't have knowledge/time to customize. This would be the sort of application for which XML is critical--maybe a SOAP server or some such; might be written to use a wide variety of parsers, and wants to exploit abilities where possible without too much development overhead. These would be the sort that would want a good default implementation, preferably easily available through standard API's, and would not use preparsing. (c) Performance is critical, and so is stability, but perhaps there's not time to develop a custom XMLGrammarPoolImpl. These would be the sort of applications that would use grammar preparsing and locking. (d) The super-application that wants everything its own way. It's in order to accommodate applications of type (b) that I think we're compelled to have a special configuration that provides access to a decent default implementation of grammar caching through a no-arg constructor. These guys will want to use JAXP to instantiate their parser, and setting a system property to direct our JAXP implementation to use XMLGrammarCachingConfiguration would be the easiest, least-intrusive way for this kind of application to get what it needs. If all we provided was a property then the app would have to test its parser to see if it understood the property; this is doable but seems too much overhead to me. But supporting applications of type (c) and (d) is also critical. I'd just hate to see us optimize for this case when (c)- and (d)-type developers are already likely to be familiar with--or at least familiarizable with--xerces's guts to the extent that they'll be able to handle some inconvenience. So I tend to think preparsing should stay on a specific configuration, not on StandardParserConfiguration. But allowing an XMLGrammarPool to be set on a configuration via a property just makes sense to my mind; if someone wants to do it, there's no reason not to let them! To be more concrete, here are my thoughts on how to stabilize things: - XMLGrammarPool should be made into an external property. This should be read-only within a parse, but if an application wants to change it between parses it would be caveat emptor. It's conceivable this could be useful, so no point in preventing it. - DOMASBuilderImpl should instantiate itself using XMLGrammarCachingConfiguration rather than DTDXSParserConfiguration, and XMLGrammarCachingConfiguration should extend DTDXSParserConfiguration rather than StandardParserConfiguration. - Once that's done, DOMASBuilderImpl can be made to use XMLGrammarCachingConfiguration's grammar parsing functionality rather than doing its own thing; this should considerably increase its maintainability. - Given DOM's requirements, it may be necessary to use a different grammar pool implementation here than XMLGrammarCachingConfiguration uses by default. I hope this does not turn out to be the case though; it would be nice to be able to configure our single default implementation to give it the right characteristics. If we're not able to do otherwise then perhaps it's not sufficiently flexible? This is the best I can think of in terms of making life easiest for users who need simplicity, while giving all kinds of power to power-users. If you're a user and you've read this far, chances are you're pretty advanced; but I'd love to hear from everyone about suggestions for how we can strike a better balance and if there's any necessary functionality I've missed. Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
