>What is the desired behavior of DOMBuilder when it receives:
     startCDATA()
     characters()
     ...
     charcaters()
     endCDATA()
>The current effect is the creation of one CDATA Node per characters event,
>and that seems counter to SAX fundamentals.
> It also seems counter to the JavaDoc for the class --

Re that last point: No, not really. The JavaDoc taks about "contiguous
character data", and makes no promises about DOM nodes.


Re the former: The DOM spec only _requires_ normalization of adjacent text
when the DOM is generated directly by parsers, and doesn't say anything
about SAX (which might not be from a parser). So one can defend not doing
the additional work here. Since the multiple CDATASections should have the
same _semantic_ meaning (no XML application should ever be sensitive to the
boundaries of <![CDATA[]]> markup!), I'm not conviced  that failing to
merge these would actually be _wrong_.

Note that org.apache.xml.utils.DOMBuilder doesn't merge characters() calls
when building Text nodes either. The DOM definitely permits that.

So I'm not sure I'd call this a bug, per se.


HOWEVER: I do agree that it may be a misfeature.

When I've written SAX-to-DOM builders, I have generally assumed that
successive SAX characters() events were intended to be accumulated into a
single DOM node, whether Text or CDATASection. I prefer generating the DOM
in normalized form and it's easier to handle both of these similarly rather
than special-casing non-CDATASection Text.

It wouldn't be hard to change DOMBuilder to provide that behavior, if we
prefer it. Diffs for a quick-and-dirty patch follow. I haven't tested it
intensively; it may not actually help, and it may not be maximally
efficient, but it loks plausible and seems to avoid breaking anything when
I run our D2D regression test.


xalan/java/src/org/apache/xml/utils/DOMBuilder.java,v
retrieving revision 1.9
diff -r1.9 DOMBuilder.java
92a93,97
>   /** True if last node appended was CharacterData and we haven't
>       crossed into/out of a CDATASection since then
>    */
>   protected boolean m_accumulateChars=false;
>
168d172
<
169a174
>     int type=newNode.getNodeType();
173a179
>       m_accumulateChars=(type == Node.TEXT_NODE || type
==Node.CDATA_SECTION_NODE);
179a186
>       m_accumulateChars=(type == Node.TEXT_NODE || type
==Node.CDATA_SECTION_NODE);
184d190
<       short type = newNode.getNodeType();
186c192
<       if (type == Node.TEXT_NODE)
---
>       if (type == Node.TEXT_NODE || type==Node.CDATA_SECTION_NODE)
380a387
>     m_accumulateChars=false;
424,430d430
<     if (m_inCData)
<     {
<       cdata(ch, start, length);
<
<       return;
<     }
<
432,434c432,438
<     Text text = m_doc.createTextNode(s);
<
<     append(text);
---
>     if(m_accumulateChars)
>       ((CharacterData)m_currentNode.getLastChild()).appendData(s);
>     else
>       if (m_inCData)
>    append(m_doc.createCDATASection(s));
>       else
>    append(m_doc.createTextNode(s));
591a596
>     m_accumulateChars=false;
601a607
>     m_accumulateChars=false;
635a642,645
>
>     // Presumably, if someone called cdata from outside they want it to
be
>     // atomic...
>     m_accumulateChars=false;

*****CVS exited normally with code 1*****


Reply via email to