>What is the desired behavior of DOMBuilder when it receives:
startCDATA()
characters()
...
charcaters()
endCDATA()
>The current effect is the creation of one CDATA Node per characters event,
>and that seems counter to SAX fundamentals.
> It also seems counter to the JavaDoc for the class --
Re that last point: No, not really. The JavaDoc taks about "contiguous
character data", and makes no promises about DOM nodes.
Re the former: The DOM spec only _requires_ normalization of adjacent text
when the DOM is generated directly by parsers, and doesn't say anything
about SAX (which might not be from a parser). So one can defend not doing
the additional work here. Since the multiple CDATASections should have the
same _semantic_ meaning (no XML application should ever be sensitive to the
boundaries of <![CDATA[]]> markup!), I'm not conviced that failing to
merge these would actually be _wrong_.
Note that org.apache.xml.utils.DOMBuilder doesn't merge characters() calls
when building Text nodes either. The DOM definitely permits that.
So I'm not sure I'd call this a bug, per se.
HOWEVER: I do agree that it may be a misfeature.
When I've written SAX-to-DOM builders, I have generally assumed that
successive SAX characters() events were intended to be accumulated into a
single DOM node, whether Text or CDATASection. I prefer generating the DOM
in normalized form and it's easier to handle both of these similarly rather
than special-casing non-CDATASection Text.
It wouldn't be hard to change DOMBuilder to provide that behavior, if we
prefer it. Diffs for a quick-and-dirty patch follow. I haven't tested it
intensively; it may not actually help, and it may not be maximally
efficient, but it loks plausible and seems to avoid breaking anything when
I run our D2D regression test.
xalan/java/src/org/apache/xml/utils/DOMBuilder.java,v
retrieving revision 1.9
diff -r1.9 DOMBuilder.java
92a93,97
> /** True if last node appended was CharacterData and we haven't
> crossed into/out of a CDATASection since then
> */
> protected boolean m_accumulateChars=false;
>
168d172
<
169a174
> int type=newNode.getNodeType();
173a179
> m_accumulateChars=(type == Node.TEXT_NODE || type
==Node.CDATA_SECTION_NODE);
179a186
> m_accumulateChars=(type == Node.TEXT_NODE || type
==Node.CDATA_SECTION_NODE);
184d190
< short type = newNode.getNodeType();
186c192
< if (type == Node.TEXT_NODE)
---
> if (type == Node.TEXT_NODE || type==Node.CDATA_SECTION_NODE)
380a387
> m_accumulateChars=false;
424,430d430
< if (m_inCData)
< {
< cdata(ch, start, length);
<
< return;
< }
<
432,434c432,438
< Text text = m_doc.createTextNode(s);
<
< append(text);
---
> if(m_accumulateChars)
> ((CharacterData)m_currentNode.getLastChild()).appendData(s);
> else
> if (m_inCData)
> append(m_doc.createCDATASection(s));
> else
> append(m_doc.createTextNode(s));
591a596
> m_accumulateChars=false;
601a607
> m_accumulateChars=false;
635a642,645
>
> // Presumably, if someone called cdata from outside they want it to
be
> // atomic...
> m_accumulateChars=false;
*****CVS exited normally with code 1*****