tng         2003/01/24 11:59:56

  Modified:    c/doc    faq-parse.xml program-dom.xml program.xml
  Log:
  Add FAQ about how entity reference are handled by DOMWriter.
  
  Revision  Changes    Path
  1.54      +23 -1     xml-xerces/c/doc/faq-parse.xml
  
  Index: faq-parse.xml
  ===================================================================
  RCS file: /home/cvs/xml-xerces/c/doc/faq-parse.xml,v
  retrieving revision 1.53
  retrieving revision 1.54
  diff -u -r1.53 -r1.54
  --- faq-parse.xml     23 Jan 2003 19:34:43 -0000      1.53
  +++ faq-parse.xml     24 Jan 2003 19:59:56 -0000      1.54
  @@ -749,7 +749,7 @@
   
     <faq title="Why do I get segmentation fault when running on Redhat Linux?">
   
  -    <q> Why do I get segmentation fault when running on Redhat Linux?</q>
  +    <q>Why do I get segmentation fault when running on Redhat Linux?</q>
   
       <a>
   
  @@ -759,6 +759,28 @@
            Please try to upgrade your Redhat Linux gcc to the latest patch level and 
see if it helps.
         </p>
   
  +    </a>
  +  </faq>
  +
  +  <faq title="Why does the XML data generated by the DOMWriter does not match my 
original XML input?">
  +
  +    <q>Why does the XML data generated by the DOMWriter does not match my original 
XML input?</q>
  +
  +    <a>
  +
  +      <p>If you parse an xml document using XercesDOMParser or DOMBuilder and pass 
such DOMNode
  +         to DOMWriter for serialization, you may not get something that is exactly 
the same
  +         as the original XML data.   The parser may have done normalization, end of 
line conversion,
  +         or has expanded the entity reference as per the XML 1.0 spec, 4.4 XML 
Processor Treatment of
  +         Entities and References.   From DOMWriter perspective, it does not know 
what the original
  +         string was, all it sees is a processed DOMNode generated by the parser.
  +         But since the DOMWriter is supposed to generate something that is parsable 
if sent
  +         back to the parser, it will not print the DOMNode node value as is.    The 
DOMWriter
  +         may do some "touch up" to the output data for it to be parsable.</p>
  +
  +      <p>See <jump href="program-dom.html#DOMWriterEntityRef">How does DOMWriter 
handle built-in entity
  +         Reference in node value?</jump> to understand further how DOMWriter 
touches up the entity reference.
  +      </p>
       </a>
     </faq>
   
  
  
  
  1.26      +102 -0    xml-xerces/c/doc/program-dom.xml
  
  Index: program-dom.xml
  ===================================================================
  RCS file: /home/cvs/xml-xerces/c/doc/program-dom.xml,v
  retrieving revision 1.25
  retrieving revision 1.26
  diff -u -r1.25 -r1.26
  --- program-dom.xml   9 Jan 2003 20:15:39 -0000       1.25
  +++ program-dom.xml   24 Jan 2003 19:59:56 -0000      1.26
  @@ -1324,6 +1324,108 @@
             </p>
           </s3>
   
  +        <anchor name="DOMWriterEntityRef"/>
  +        <s3 title="How does DOMWriter handle built-in entity Reference in node 
value?">
  +
  +          <p>Say for example you parse the following xml document using 
XercesDOMParser or DOMBuilder</p>
  +<source>
  +&lt;root>
  +&lt;Test attr=" > ' &amp;lt; &amp;gt; &amp;amp; &amp;quot; &amp;apos; ">&lt;/Test>
  +&lt;Test attr=' > " &amp;lt; &amp;gt; &amp;amp; &amp;quot; &amp;apos; '>&lt;/Test>
  +&lt;Test> >  " ' &amp;lt; &amp;gt; &amp;amp; &amp;quot; &amp;apos; &lt;/Test>
  +&lt;Test>&lt;![CDATA[&lt; > &amp; " ' &amp;lt; &amp;gt; &amp;amp; &amp;quot; 
&amp;apos; ]]&gt;&lt;/Test>
  +&lt;/root>
  +</source>
  +           <p>According to XML 1.0 spec, 4.4 XML Processor Treatment of Entities 
and References, the parser
  +           will expand the entity reference as follows</p>
  +<source>
  +&lt;root>
  +&lt;Test attr=" > ' &lt; &gt; &amp; &quot; &apos; ">&lt;/Test>
  +&lt;Test attr=' > " &lt; &gt; &amp; &quot; &apos; '>&lt;/Test>
  +&lt;Test> >  " ' &lt; &gt; &amp; &quot; &apos; &lt;/Test>
  +&lt;Test>&lt;![CDATA[&lt; > &amp; " ' &amp;lt; &amp;gt; &amp;amp; &amp;quot; 
&amp;apos; ]]&gt;&lt;/Test>
  +&lt;/root>
  +</source>
  +
  +           <p>and pass such DOMNode to DOMWriter for serialization.   From 
DOMWriter perspective, it
  +           does not know what the original string was.   All it sees is above 
DOMNode from the
  +           parser.   But since the DOMWriter is supposed to generate something that 
is parsable if sent
  +           back to the parser, it cannot print such string as is.    Thus the 
DOMWriter is doing some
  +           "touch up", just enough, to get the string parsable.</p>
  +
  +           <p>So for example since the appearance of &lt; and &amp; in text value 
will lead to
  +           not well-form XML error, the DOMWriter fixes them to &amp;lt; and 
&amp;amp;
  +           respectively; while the >, ' and " in text value are ok to the parser, 
so DOMWriter does not
  +           do anything to them.   Similarly the DOMWriter fixes some of the 
characters for the attribute value
  +           but keep everything in CDATA.</p>
  +
  +           <p>So the string that is generated by DOMWriter will look like this</p>
  +<source>
  +&lt;root>
  +&lt;Test attr=" > ' &amp;lt; > &amp;amp; &amp;quot; ' "/>
  +&lt;Test attr=" > &amp;quot; &amp;lt; > &amp;amp; &amp;quot; ' "/>
  +&lt;Test> >  " ' &amp;lt; > &amp;amp; " ' &lt;/Test>
  +&lt;Test>&lt;![CDATA[&lt; > &amp; " ' &amp;lt; &amp;gt; &amp;amp; &amp;quot; 
&amp;apos; ]]&gt;&lt;/Test>
  +&lt;/root>
  +</source>
  +            <p>To summarize, here is the table that summarize how built-in entity 
refernece are handled for
  +            different Node Type:</p>
  +      <table>
  +        <tr>
  +          <th><em>Input/Output</em></th>
  +          <th><em>&lt;</em></th>
  +          <th><em>&gt;</em></th>
  +          <th><em>&amp;</em></th>
  +          <th><em>&quot;</em></th>
  +          <th><em>&apos;</em></th>
  +          <th><em>&amp;lt;</em></th>
  +          <th><em>&amp;gt;</em></th>
  +          <th><em>&amp;amp;</em></th>
  +          <th><em>&amp;quot;</em></th>
  +          <th><em>&amp;apos;</em></th>
  +        </tr>
  +        <tr>
  +          <td><em>Attribute</em></td>
  +          <td>N/A</td>
  +          <td>&gt;</td>
  +          <td>N/A</td>
  +          <td>&amp;quot;</td>
  +          <td>&apos;</td>
  +          <td>&amp;lt;</td>
  +          <td>&gt;</td>
  +          <td>&amp;amp;</td>
  +          <td>&amp;quot;</td>
  +          <td>&apos;</td>
  +        </tr>
  +        <tr>
  +          <td><em>Text</em></td>
  +          <td>N/A</td>
  +          <td>&gt;</td>
  +          <td>N/A</td>
  +          <td>&quot;</td>
  +          <td>&apos;</td>
  +          <td>&amp;lt;</td>
  +          <td>&gt;</td>
  +          <td>&amp;amp;</td>
  +          <td>&quot;</td>
  +          <td>&apos;</td>
  +        </tr>
  +        <tr>
  +          <td><em>CDATA</em></td>
  +          <td>&lt;</td>
  +          <td>&gt;</td>
  +          <td>&amp;</td>
  +          <td>&quot;</td>
  +          <td>&apos;</td>
  +          <td>&amp;lt;</td>
  +          <td>&amp;gt;</td>
  +          <td>&amp;amp;</td>
  +          <td>&amp;quot;</td>
  +          <td>&amp;apos;</td>
  +        </tr>
  +        </table>
  +        </s3>
  +
           <anchor name="DOMWriterFeatures"/>
           <s3 title="DOMWriter Supported Features">
   
  
  
  
  1.35      +1 -0      xml-xerces/c/doc/program.xml
  
  Index: program.xml
  ===================================================================
  RCS file: /home/cvs/xml-xerces/c/doc/program.xml,v
  retrieving revision 1.34
  retrieving revision 1.35
  diff -u -r1.34 -r1.35
  --- program.xml       6 Jan 2003 21:19:41 -0000       1.34
  +++ program.xml       24 Jan 2003 19:59:56 -0000      1.35
  @@ -35,6 +35,7 @@
         <li><jump href="program-dom.html#DOMWriter">DOMWriter</jump></li>
         <ul>
             <li><jump href="program-dom.html#ConstructDOMWriter">Constructing a 
DOMWriter</jump></li>
  +          <li><jump href="program-dom.html#DOMWriterEntityRef">How does DOMWriter 
handle built-in entity Reference in node value?</jump></li>
             <li><jump href="program-dom.html#DOMWriterFeatures">Supported 
Features</jump></li>
         </ul>
         <li><jump href="program-dom.html#Deprecated">Deprecated - Java-like 
DOM</jump></li>
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to