Samuel Cheung wrote:
> Does anyone know if either Jtidy or NekoHTML creates a HTML DOM (defined in
> the package org.apache.html.dom) from a HTML document? In other words, for
> each <input> tag, it creates an object HTMLInputElementImpl, for <image>
> tag, it creates an object HTMLImageElementImpl, and so on.
For NekoHTML, the org.cyberneko.html.parsers.DOMParser class
uses the Xerces HTML DOM implementation. So it should create
the appropriate objects depending on the element name. But
come to think of it, I haven't really checked this well...
Okay, I wrote a simple test program that parses documents
using the NekoHTML parser and prints out the tree of the node
class names so that you can see that the actual HTML DOM
objects are being created. The sample program is attached.
--
Andy Clark * [EMAIL PROTECTED]
import org.cyberneko.html.parsers.DOMParser;
import org.w3c.dom.*;
import org.w3c.dom.html.*;
public class TestHTMLDOM {
public static void main(String[] argv) throws Exception {
DOMParser parser = new DOMParser();
for (int i = 0; i < argv.length; i++) {
parser.parse(argv[i]);
Document document = parser.getDocument();
printClassNames(document, "");
}
}
public static void printClassNames(Node node, String indent) {
if (node == null) {
return;
}
System.out.print(indent);
System.out.println(node.getClass().getName());
Node child = node.getFirstChild();
while (child != null) {
printClassNames(child, indent+" ");
child = child.getNextSibling();
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]