hey all...
so after digging some more into this whole comments/dtd problem, i think
i've hit on something interesting. i think the correct solution for the
dtd problem is that in the 'startDTD(name,publicId,systemId)' method, we
should create a new DocumentType object, and associated it with the 'doc'
that is being constructed in the whole of the DOMParser class. I think this
would let us accomplish what we want... but please someone scream if i'm
wrong.
in pursuit of creating a DocumentType object, I simply tried:
DocumentType dtd = new DocumentTypeImpl(doc,name,publicId,systemId);
thinking that the DocumentTypeImpl would be exactly as described on
http://xml.apache.org/xerces-j/apiDocs/org/apache/xerces/dom/DocumentTypeImp
l.html But after a brief compilation failure, I discovered that
DocumentTypeImpl is actually defined in
org.apache.xindice.xml.dom.DocumentTypeImpl.java. I'm not sure that that we
actually want this... because it presents something of a problem. the only
constructors that are available for the DocumentTypeImpl class are:
()
(byte[],int,int)
(NodeImpl,byte[],int,int)
(NodeImpl,boolean)
So i can merrily say
DocumentType dtd = new DocumentTypeImpl();
but then I cannot set the name, publicId, or systemId. (the DocumentType
interface is essentially read-only)
Additionally, the DocumentTypeImpl seems rather incomplete... virtually
every get* method returns null.
i'm going to start down the path of fixing this... but i wanted to throw it
out there to see if anyone else had ideas on it...
thanks
dave viner
-----Original Message-----
From: David Viner [mailto:[EMAIL PROTECTED]
Sent: Monday, April 29, 2002 4:58 PM
To: [EMAIL PROTECTED]
Subject: dtd expansion and org.apache.xindice.xml.dom.DOMParser
hi all,
i've been looking into the dtd expansion issue with xindice/xerces. i
have
a few questions about the org.apache.xindice.xml.dom.DOMParser class.
1. why does this class use a SAXParser? not that it's a huge deal, but it
just seems kinda strange to implement a class called DOMParser which
actually uses a SAXParser object to handle the parsing.... is this a common
implementation strategy?
2. i've discovered a fix that will prevent comments from being printed from
a DTD. It involves changing 3 methods in the DOMParser class.
public void startDTD(String name, String publicId, String systemId)
throws SAXException {
this.inDTD = true;
}
public void endDTD() throws SAXException {
this.inDTD = false;
}
public void comment(char ch[], int start, int length) throws SAXException
{
if(!this.inDTD)
{
String s = new String(ch, start, length);
context.appendChild(doc.createComment(s));
}
}
this will prevent comments from being appended to the DOM tree when the
parser is parsing a dtd.
However, I don't think that this actually solves the underlying problem.
Here's how I understand the goal... imagine this pseudo-code:
String xmlPre = readFromFS("/tmp/my.xml");
// {insert,get}Document from org.xmldatabases.xmlrpc.RPCOperations
String id = insertDocument('/db/foo','bar',xmlPre);
String xmlPost = getDocument('/db/foo','bar');
// xmlPost should be exactly the same as xmlPre
The problem here is that anything like a DOCTYPE tag will be parsed and
resolved by the SAX parser (i think). So the DOCTYPE declaration that was
in the xmlPre will *not* be in xmlPost because it disappeared in the
resolution of entities when the insertDocument code called:
Document doc = DOMParser.toDocument( content );
That 'toDocument' call actually invokes the sax parser which will resolve
(and not insert) the doctype as an entity.
does this make sense to anyone else? or am i off my rocker....
thanks
dave