RE: DOMString (was: xalan-c Problem with Xerces initialization)

Robert_Weir 19 Jan 2000 05:31:57 -0000

Here's the flow for Xalan:

We have 2 inputs, an XML document, on which we use DOMParser, and an XSL
stylesheet, on which we use the SAXParser.  In our SAX DocumentHandler, we
build our stylesheet into a special "subclassed" DOM.  We then apply the
templates in the stylesheet DOM, matching patterns in the document DOM.
When we need to realize output, then we send result elements through SAX
again, since we have DocumentHandler subclasses that do XML, HTML and Text
formatting.

When creating the stylesheet's DOM, and when output formatting via SAX, we
need to deal with literal wide-character strings, for things like character
entities and standard XSLT element names.  With DOM, we can use the macro
we've been discussing, to avoid transcoding expect under platforms like
HP-UX, where transcoding via DOMStrings would be used.  But then we need a
solution for pushing string literals through SAX.  It would be really nice
if it was the same solution.  We could use arrays of Unicode characters,
like Xerces does, though that looks painful.

You can also probably see our performance bottleneck.  Everytime we cross
the interface from  SAX to DOM, we need to create a DOMString from XMLCh*
and this means that we need to copy the data.  If we had a single string
class, reference-counted with copy-on-write semantics,  used by both the
DOM and SAX API's, then we could imagine something very nice: many strings
could be copied into a buffer once, at parse time, and stay in that buffer,
as the string's handle was passed from SAX to DOM to SAX and finally
written out.  Most uses of XSLT involve restructuring the data, rearranging
it, taking content and wrapping it in style tags.  So, many or even most of
the strings would benefit if we could avoid this impedance mismatch.

-Rob

Dean Roddey wrote:

>>I guess the meta-question is this: do we want a solution that encompasses

>>SAX and XMLCh*, or just DOM and DOMString? It seems that the differences
>>in the representation of L"foo" is going to effect SAX as well. And Xalan

>>uses both

>I'm not sure that there is a problem here. SAX is just a layer on the
>parser that just spits stuff out. Its always spit out in XMLCh format, and

>was always that way since it was internalized down in the guts of the
>parser.

>Or, are you saying that you guys are also spitting stuff out a SAX
>interface?

RE: DOMString (was: xalan-c Problem with Xerces initialization)

Reply via email to