Hi folks, It's interesting how easy it is to lose sight of the forest when you're surrounded by many trees. We've been concentrating hard on the Xerces2 schema parsing redesign, grammar resolution design and validation for so long that it only very recently occurred to us that we might not have shared the motivations for these designs with the rest of the community. This lack of understanding of what it is that we're trying to accomplish may explain the fact that contributions to the discussion have been limited so far to only a rather small number of people. To understand where we're coming from, a few things should be noted about Xerces1. By contrast with the Xerces2 emphasis on pipelines and modular design, no doubt everybody's aware that Xerces1's design is rather monolithic--all its parts are tightly coupled and there are a lot of nasty interdependencies. One of the big motivations of the work we're trying to do--as has been true for all the other areas of Xerces2--is to make the design as modular as possible, so that people can reuse whatever portions of the parser they find useful in their applications. The Xerces1 design had other drawbacks, especially when it comes to schemas. The whole Xerces1 grammar structure was based on DTD's, and schema support was more or less grafted on to it; that is to say schema grammars sort of look like overgrown DTD's in the Xerces1 world. For a while one could argue that even if this wasn't elegant, at least it promoted code reuse. But as time went by and the Xerces1 support of schemas matured, it became more apparent to us that schemas really are something of their own beast, and that treating them like DTD's simply made for really hacked-up, unmaintainable code. So while in Xerces2 we want to reuse code if it looks feasible, this isn't our primary goal. Xerces1 looked at the importing or including of one schema by another much in the way that it viewed the referencing of an external entity by an internal DTD subset. The consequence of this is that Xerces1 has some limitations with respect to schemas mutually referencing one another that would have required a heavy amount of redesign to overcome. This, along with a desire to break up one class (the infamous TraverseSchema, currently at version #241!) which had grown to over 9,000 lines, is the principal motivator behind the SchemaHandler interface that we've been kicking around. Xerces2's schema support will also have to eventually include some means of exposing the PSVIi (post-schema validation infoset). We're uncertain what form this will take--whether it will be a DOM API, some kind of output like that produced by Henry Thomson's XSV, or even an XNI-based API--but, as we're redesigning schema parsing and building the SchemaGrammar representation, we'll have to take care to make it sufficiently rich to store all the information necessary to make this happen. Another obvious weakness of Xerces1 was the fact that grammars could not be reused. So having an infrstructure to provide for a way to cache grammmars is another very important requirement for Xerces2. On the other hand, since grammars will be cachable it makes it somewhat less important that the conversion of schema documents into grammar objects be lightning fast. Thus, while we certainly want our schema parsing to be efficient, we're quite prepared to sacrifice the odd shortcut if it makes the design cleaner and more comprehensible. That said, we're still very much concerned about efficiently validating instance documents according to grammars. The Xerces1 validator is another maintenance and efficiency nightmare, largely because it tries to be "universal"--that is, to validate both schemas and DTD's. Here again we're prepared to sacrifice some amount of shared code to make validators that are more modular and better at what they do: thus, we believe that Xerces2 needs specialized DTD and Schema validators. Another requirement imposed by DOM level 3--and indeed one for which we got a lot of requests when Andy circulated a Xerces2 features survey last January--was DOM tree revalidation. While this might have been doable in Xerces1 without too much redesign, we're taking care in Xerces2 to take this into account from the beginning as we contemplate how validation should occur. Down the road, it would certainly be nice if Xerces2 could parse an XML document into a Xalan-type DTM. It would also be nice if it could support validation according to Relax NG grammars. DOM level 3 contemplates adding XPath support to the DOM, and obviously we'll want to implement this as well. All this we have to keep in mind as we're putting things together, although this last set is further down the list. In fact, one of the hardest things we're finding about this whole process--and perhaps the main reason it's been so slow--is its scope and the difficulty of developing and keeping a focus on what needs to be done first. Having set all this out--and once again asking folks to keep your wishlists manageable lest the design discussions become even more bogged down--I'm very curious to know whether this list misses anything significant? Are there important things that people would like to see Xerces2 do that I haven't mentioned? Do people agree generally with the stance we're taking, or would you like us to place emphasis differently--on faster schema processing vs. a focus on good code, for instance? As always, feedback is more than welcome. And if people agree with the tack we're taking, hopefully this posting will make the discussions we're currently having about the shape of grammar caching or validation more approachable and easier for people to participate in who are less familiar with the guts of how things are done. Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 416-448-3519, T/L 778-3519 E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]