I've ported the OpenXML HTML DOM to work on top of the Xerces DOM. This
implementation is currently available in the OpenXML release under the
package name org.apache.html.dom. If Xerces is available in the
classpath, the OpenXML HTML parser will now use the Xerces HTML DOM
rather than the OpenXML HTML DOM.

I've indentified a conflict in Xerces, apparently ElementImpl defines a
public method getValue() that conflicts with HTMLLIElement defining the
same method with a different return type. I could not find any use for
ElementImpl.getValue and commenting it out did not break the build. I
would like a confirmation that this fix will not break anything before
committing it.

Providing HTML functionality to Xerces requires adding three packages:
the HTML DOM (org.w3c.dom.html), a Xerces HTML implementation, and the
parser (separate code base than the XML parser). This will increase the
JAR size by an additional 100KB, it might (or might not) be smart to add
it as a separate add-on JAR.

Currently I'm using the package name org.apache.html.dom. Is this in
line with the proposed org.apache.xml and should the parser reside in
org.apache.html.parser?

arkin

-- 
____________________________________________________________
Assaf Arkin                               [EMAIL PROTECTED]
CTO                                  http://www.exoffice.com
Exoffice, The ExoLab Company             tel: (650) 259-9796

Reply via email to