tng 2003/01/24 11:59:56 Modified: c/doc faq-parse.xml program-dom.xml program.xml Log: Add FAQ about how entity reference are handled by DOMWriter. Revision Changes Path 1.54 +23 -1 xml-xerces/c/doc/faq-parse.xml Index: faq-parse.xml =================================================================== RCS file: /home/cvs/xml-xerces/c/doc/faq-parse.xml,v retrieving revision 1.53 retrieving revision 1.54 diff -u -r1.53 -r1.54 --- faq-parse.xml 23 Jan 2003 19:34:43 -0000 1.53 +++ faq-parse.xml 24 Jan 2003 19:59:56 -0000 1.54 @@ -749,7 +749,7 @@ <faq title="Why do I get segmentation fault when running on Redhat Linux?"> - <q> Why do I get segmentation fault when running on Redhat Linux?</q> + <q>Why do I get segmentation fault when running on Redhat Linux?</q> <a> @@ -759,6 +759,28 @@ Please try to upgrade your Redhat Linux gcc to the latest patch level and see if it helps. </p> + </a> + </faq> + + <faq title="Why does the XML data generated by the DOMWriter does not match my original XML input?"> + + <q>Why does the XML data generated by the DOMWriter does not match my original XML input?</q> + + <a> + + <p>If you parse an xml document using XercesDOMParser or DOMBuilder and pass such DOMNode + to DOMWriter for serialization, you may not get something that is exactly the same + as the original XML data. The parser may have done normalization, end of line conversion, + or has expanded the entity reference as per the XML 1.0 spec, 4.4 XML Processor Treatment of + Entities and References. From DOMWriter perspective, it does not know what the original + string was, all it sees is a processed DOMNode generated by the parser. + But since the DOMWriter is supposed to generate something that is parsable if sent + back to the parser, it will not print the DOMNode node value as is. The DOMWriter + may do some "touch up" to the output data for it to be parsable.</p> + + <p>See <jump href="program-dom.html#DOMWriterEntityRef">How does DOMWriter handle built-in entity + Reference in node value?</jump> to understand further how DOMWriter touches up the entity reference. + </p> </a> </faq> 1.26 +102 -0 xml-xerces/c/doc/program-dom.xml Index: program-dom.xml =================================================================== RCS file: /home/cvs/xml-xerces/c/doc/program-dom.xml,v retrieving revision 1.25 retrieving revision 1.26 diff -u -r1.25 -r1.26 --- program-dom.xml 9 Jan 2003 20:15:39 -0000 1.25 +++ program-dom.xml 24 Jan 2003 19:59:56 -0000 1.26 @@ -1324,6 +1324,108 @@ </p> </s3> + <anchor name="DOMWriterEntityRef"/> + <s3 title="How does DOMWriter handle built-in entity Reference in node value?"> + + <p>Say for example you parse the following xml document using XercesDOMParser or DOMBuilder</p> +<source> +<root> +<Test attr=" > ' &lt; &gt; &amp; &quot; &apos; "></Test> +<Test attr=' > " &lt; &gt; &amp; &quot; &apos; '></Test> +<Test> > " ' &lt; &gt; &amp; &quot; &apos; </Test> +<Test><![CDATA[< > & " ' &lt; &gt; &amp; &quot; &apos; ]]></Test> +</root> +</source> + <p>According to XML 1.0 spec, 4.4 XML Processor Treatment of Entities and References, the parser + will expand the entity reference as follows</p> +<source> +<root> +<Test attr=" > ' < > & " ' "></Test> +<Test attr=' > " < > & " ' '></Test> +<Test> > " ' < > & " ' </Test> +<Test><![CDATA[< > & " ' &lt; &gt; &amp; &quot; &apos; ]]></Test> +</root> +</source> + + <p>and pass such DOMNode to DOMWriter for serialization. From DOMWriter perspective, it + does not know what the original string was. All it sees is above DOMNode from the + parser. But since the DOMWriter is supposed to generate something that is parsable if sent + back to the parser, it cannot print such string as is. Thus the DOMWriter is doing some + "touch up", just enough, to get the string parsable.</p> + + <p>So for example since the appearance of < and & in text value will lead to + not well-form XML error, the DOMWriter fixes them to &lt; and &amp; + respectively; while the >, ' and " in text value are ok to the parser, so DOMWriter does not + do anything to them. Similarly the DOMWriter fixes some of the characters for the attribute value + but keep everything in CDATA.</p> + + <p>So the string that is generated by DOMWriter will look like this</p> +<source> +<root> +<Test attr=" > ' &lt; > &amp; &quot; ' "/> +<Test attr=" > &quot; &lt; > &amp; &quot; ' "/> +<Test> > " ' &lt; > &amp; " ' </Test> +<Test><![CDATA[< > & " ' &lt; &gt; &amp; &quot; &apos; ]]></Test> +</root> +</source> + <p>To summarize, here is the table that summarize how built-in entity refernece are handled for + different Node Type:</p> + <table> + <tr> + <th><em>Input/Output</em></th> + <th><em><</em></th> + <th><em>></em></th> + <th><em>&</em></th> + <th><em>"</em></th> + <th><em>'</em></th> + <th><em>&lt;</em></th> + <th><em>&gt;</em></th> + <th><em>&amp;</em></th> + <th><em>&quot;</em></th> + <th><em>&apos;</em></th> + </tr> + <tr> + <td><em>Attribute</em></td> + <td>N/A</td> + <td>></td> + <td>N/A</td> + <td>&quot;</td> + <td>'</td> + <td>&lt;</td> + <td>></td> + <td>&amp;</td> + <td>&quot;</td> + <td>'</td> + </tr> + <tr> + <td><em>Text</em></td> + <td>N/A</td> + <td>></td> + <td>N/A</td> + <td>"</td> + <td>'</td> + <td>&lt;</td> + <td>></td> + <td>&amp;</td> + <td>"</td> + <td>'</td> + </tr> + <tr> + <td><em>CDATA</em></td> + <td><</td> + <td>></td> + <td>&</td> + <td>"</td> + <td>'</td> + <td>&lt;</td> + <td>&gt;</td> + <td>&amp;</td> + <td>&quot;</td> + <td>&apos;</td> + </tr> + </table> + </s3> + <anchor name="DOMWriterFeatures"/> <s3 title="DOMWriter Supported Features"> 1.35 +1 -0 xml-xerces/c/doc/program.xml Index: program.xml =================================================================== RCS file: /home/cvs/xml-xerces/c/doc/program.xml,v retrieving revision 1.34 retrieving revision 1.35 diff -u -r1.34 -r1.35 --- program.xml 6 Jan 2003 21:19:41 -0000 1.34 +++ program.xml 24 Jan 2003 19:59:56 -0000 1.35 @@ -35,6 +35,7 @@ <li><jump href="program-dom.html#DOMWriter">DOMWriter</jump></li> <ul> <li><jump href="program-dom.html#ConstructDOMWriter">Constructing a DOMWriter</jump></li> + <li><jump href="program-dom.html#DOMWriterEntityRef">How does DOMWriter handle built-in entity Reference in node value?</jump></li> <li><jump href="program-dom.html#DOMWriterFeatures">Supported Features</jump></li> </ul> <li><jump href="program-dom.html#Deprecated">Deprecated - Java-like DOM</jump></li>
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]