hey all...
so after digging some more into this whole comments/dtd problem, i think
i've hit on something interesting.   i think the correct solution for the
dtd problem is that in the 'startDTD(name,publicId,systemId)' method, we
should create a new DocumentType object, and associated it with the 'doc'
that is being constructed in the whole of the DOMParser class.  I think this
would let us accomplish what we want... but please someone scream if i'm
wrong.

in pursuit of creating a DocumentType object, I simply tried:
        DocumentType dtd = new DocumentTypeImpl(doc,name,publicId,systemId);
thinking that the DocumentTypeImpl would be exactly as described on
http://xml.apache.org/xerces-j/apiDocs/org/apache/xerces/dom/DocumentTypeImp
l.html  But after a brief compilation failure, I discovered that
DocumentTypeImpl is actually defined in
org.apache.xindice.xml.dom.DocumentTypeImpl.java.  I'm not sure that that we
actually want this... because it presents something of a problem.  the only
constructors that are available for the DocumentTypeImpl class are:
        ()
        (byte[],int,int)
        (NodeImpl,byte[],int,int)
        (NodeImpl,boolean)
So i can merrily say
        DocumentType dtd = new DocumentTypeImpl();
but then I cannot set the name, publicId, or systemId.  (the DocumentType
interface is essentially read-only)

Additionally, the DocumentTypeImpl seems rather incomplete... virtually
every get* method returns null.

i'm going to start down the path of fixing this... but i wanted to throw it
out there to see if anyone else had ideas on it...

thanks
dave viner



-----Original Message-----
From: David Viner [mailto:[EMAIL PROTECTED]
Sent: Monday, April 29, 2002 4:58 PM
To: [EMAIL PROTECTED]
Subject: dtd expansion and org.apache.xindice.xml.dom.DOMParser


hi all,
        i've been looking into the dtd expansion issue with xindice/xerces.  i 
have
a few questions about the org.apache.xindice.xml.dom.DOMParser class.
1. why does this class use a SAXParser?  not that it's a huge deal, but it
just seems kinda strange to implement a class called DOMParser which
actually uses a SAXParser object to handle the parsing....  is this a common
implementation strategy?

2. i've discovered a fix that will prevent comments from being printed from
a DTD.  It involves changing 3 methods in the DOMParser class.

   public void startDTD(String name, String publicId, String systemId)
throws SAXException {
      this.inDTD = true;
   }

   public void endDTD() throws SAXException {
      this.inDTD = false;
   }

   public void comment(char ch[], int start, int length) throws SAXException
{
      if(!this.inDTD)
      {
          String s = new String(ch, start, length);
          context.appendChild(doc.createComment(s));
      }
   }

this will prevent comments from being appended to the DOM tree when the
parser is parsing a dtd.

However, I don't think that this actually solves the underlying problem.
Here's how I understand the goal... imagine this pseudo-code:
  String xmlPre = readFromFS("/tmp/my.xml");
  // {insert,get}Document from org.xmldatabases.xmlrpc.RPCOperations
  String id = insertDocument('/db/foo','bar',xmlPre);
  String xmlPost = getDocument('/db/foo','bar');

  // xmlPost should be exactly the same as xmlPre

The problem here is that anything like a DOCTYPE tag will be parsed and
resolved by the SAX parser (i think).  So the DOCTYPE declaration that was
in the xmlPre will *not* be in xmlPost because it disappeared in the
resolution of entities when the insertDocument code called:
        Document doc = DOMParser.toDocument( content );
That 'toDocument' call actually invokes the sax parser which will resolve
(and not insert) the doctype as an entity.

does this make sense to anyone else? or am i off my rocker....

thanks
dave


Reply via email to