I've noticed that when I parse an XML document through JAXP with Xerces 
into a DOM document object, the resulting Document object includes all 
comments that occur in the external DTD subset as children of the 
document node. As near as I can tell, this is not correct. Crimson does 
not exhibit this problem.

Can anyone confirm or deny that this is a bug, and whether my 
understanding of the problem is correct? The basic issue is this. 
Consider this XML document:

Here's an XML document to parse with this:

<!DOCTYPE test SYSTEM "test.dtd">
<!-- Comment in document -->
<test>
  Hello
</test>

And here's the DTD:

<!-- comment in DTD -->
<!ELEMENT test (#PCDATA)>

If we parse the XML document into  a DOM Document object, how many 
comment children should that Document object have? 1 or 2? In fact, in 
Xerces 2.0.0 and 2.0.1 it has two, including the one from the external 
DTD subset.

Here's a simple program to demonstrate the problem:

import javax.xml.parsers.*;
import org.w3c.dom.*;


public class Test {

  public static void main(String[] args) {

    System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
     "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
   
    String input = "test.xml";
    if (args.length != 0) {
      input = args[0];
    }
   
    try {
      DocumentBuilderFactory factory
       = DocumentBuilderFactory.newInstance();
      factory.setNamespaceAware(true);  
      DocumentBuilder builder = factory.newDocumentBuilder();   
   
      Node doc = builder.parse(input);

      NodeList kids = doc.getChildNodes();
      for (int i = 0; i < kids.getLength(); i++) {
         Node node = kids.item(i);
         System.out.println(node.getNodeType() + ": "
          + node.getNodeValue());
      }

    }
    catch (Exception e) {
      System.err.println(e);
      e.printStackTrace();
    }
 
  } // end main
 
} // end test

And finally here's the incorrect output using Xerces-J 2.0.1:

D:\xml\bug>java Test
10: null
8:  comment in DTD
8:  Comment in document
1: null

Notice that the comment from the DTD is a child of the root Document 
element. That's the problem. I have submitted this in Bugzilla, but I'm 
not 100% sure it really is a bug. Confirmation or denial would be 
appreciated.

-- 
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+-----------------------+------------------------+-------------------+ 
|           The XML Bible, 2nd Edition (IDG Books, 2001)             |
|             http://www.cafeconleche.org/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:   http://www.cafeaulait.org/     | 
|  Read Cafe con Leche for XML News:  http://www.cafeconleche.org/   |
+----------------------------------+---------------------------------+



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to