Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

Henri Sivonen Wed, 16 Apr 2008 03:16:35 -0700

On Apr 16, 2008, at 12:58, Paul Libbrecht wrote:

In fact, the reason why the proportion of Web pages that get parsedas XML is negligible is that the XML approach totally failed toplug into the existing text/html network effects[...]
My hypothesis here is that this problem is mostly a parsing problemand not a model problem. HTML5 mixes the two.

For backwards compatibility in scripted browser environments, the HTMLDOM can't behave exactly like the XHTML5 DOM. For non-scripted non-browser environments, using an XML data model (XML DOM, XOM, JDOM,dom4j, SAX, ElementTree, lxml, etc., etc.) works fine.

There are tools that convert quite a lot of text/html pages (whosecompliance is user-defined to be "it works in my browser") to an XMLstream today NeckoHTML is one of them. The goal would be toformalize this parsing, and just this parsing.

Like NekoHTML and TagSoup, the Validator.nu HTML parser turns text/html input into Java XML models. The difference is that theValidator.nu HTML parser implements the HTML5 algorithm instead ofsomething the authors of NekoHTML and TagSoup figured out on theirown. So if you are asking for a NekoHTML-like product for HTML5, italready exists and supports three popular Java XML APIs (SAX, DOM andXOM). Not XNI, though, at the moment. (It doesn't support the recentMathML addition, *yet*, though.)


http://about.validator.nu/htmlparser/

Currently HTML5 defines at the same time parsing and the model andthis is what can cause us to expect that XML is getting weaker. Ibelieve that the whole model-definition work of XML is rich, hasmany libraries, has empowered a lot of great developments and itis a bad idea to drop it instead of enriching it.
The dominant design of non-browser HTML5 parsing libraries isexposing the document tree using an XML parser API. The non-browserHTML5 libraries, therefore, plug into the network of XML libraries.For example, Validator.nu's internals operate on SAX events thatlook like SAX events for an XHTML5 document. This allowsValidator.nu to use libraries written for XML, such as oNVDL andSaxon.
So, except for needing yet another XHTML version to accomodate allwishes, I think it would be much saner that browsers'implementations and related specifications rely on an XML-basedmodel of HTML (as the DOM is) instead of a coupled parsing-and-modelling specification which has different interpretations atdifferent places.

HTML5 already specifies parsing in terms of DOM output. However, whenthe DOM is in the HTML mode, it has to be slightly different.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

Reply via email to