Here's the flow for Xalan: We have 2 inputs, an XML document, on which we use DOMParser, and an XSL stylesheet, on which we use the SAXParser. In our SAX DocumentHandler, we build our stylesheet into a special "subclassed" DOM. We then apply the templates in the stylesheet DOM, matching patterns in the document DOM. When we need to realize output, then we send result elements through SAX again, since we have DocumentHandler subclasses that do XML, HTML and Text formatting.
When creating the stylesheet's DOM, and when output formatting via SAX, we need to deal with literal wide-character strings, for things like character entities and standard XSLT element names. With DOM, we can use the macro we've been discussing, to avoid transcoding expect under platforms like HP-UX, where transcoding via DOMStrings would be used. But then we need a solution for pushing string literals through SAX. It would be really nice if it was the same solution. We could use arrays of Unicode characters, like Xerces does, though that looks painful. You can also probably see our performance bottleneck. Everytime we cross the interface from SAX to DOM, we need to create a DOMString from XMLCh* and this means that we need to copy the data. If we had a single string class, reference-counted with copy-on-write semantics, used by both the DOM and SAX API's, then we could imagine something very nice: many strings could be copied into a buffer once, at parse time, and stay in that buffer, as the string's handle was passed from SAX to DOM to SAX and finally written out. Most uses of XSLT involve restructuring the data, rearranging it, taking content and wrapping it in style tags. So, many or even most of the strings would benefit if we could avoid this impedance mismatch. -Rob Dean Roddey wrote: >>I guess the meta-question is this: do we want a solution that encompasses >>SAX and XMLCh*, or just DOM and DOMString? It seems that the differences >>in the representation of L"foo" is going to effect SAX as well. And Xalan >>uses both >I'm not sure that there is a problem here. SAX is just a layer on the >parser that just spits stuff out. Its always spit out in XMLCh format, and >was always that way since it was internalized down in the guts of the >parser. >Or, are you saying that you guys are also spitting stuff out a SAX >interface?