Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
On Apr 16, 2008, at 10:47, Paul Libbrecht wrote: I would like to put a grain of salt here and would love HTML5 passionates to answer: why is the whole HTML5 effort not a movement towards a really enhanced parser instead of trying to redefine fully HTML successors? text/html has immense network effects both from the deployed base of text/html content and the deployed base of software that deals with text/html. Failing to plug into this existing network would be extremely bad strategy. In fact, the reason why the proportion of Web pages that get parsed as XML is negligible is that the XML approach totally failed to plug into the existing text/html network effects (except for Appendix C which lacks a migration strategy to actual XML and amounts to the emperor's new clothes). Being an enhanced parser (that would use a lot of context info to be really hand-author supportive) it would define how to parse better an XHTML 3 page, but also MathML and SVG as it does currently... It has the ability to specify very readable encodings of these pages. It could serve as a model for many other situations where XML parsing is useful but its strictness bytes some. Anne has been working on XML5, but being able to parse any well-formed stream to the same infoset as an XML 1.0 parser and being able to parse existing text/html content in a backwards-compatible way are mutually conflicting requirements. Hence, XML5 parsing won't be suitable for text/html. Currently HTML5 defines at the same time parsing and the model and this is what can cause us to expect that XML is getting weaker. I believe that the whole model-definition work of XML is rich, has many libraries, has empowered a lot of great developments and it is a bad idea to drop it instead of enriching it. The dominant design of non-browser HTML5 parsing libraries is exposing the document tree using an XML parser API. The non-browser HTML5 libraries, therefore, plug into the network of XML libraries. For example, Validator.nu's internals operate on SAX events that look like SAX events for an XHTML5 document. This allows Validator.nu to use libraries written for XML, such as oNVDL and Saxon. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
On Apr 16, 2008, at 12:58, Paul Libbrecht wrote: In fact, the reason why the proportion of Web pages that get parsed as XML is negligible is that the XML approach totally failed to plug into the existing text/html network effects[...] My hypothesis here is that this problem is mostly a parsing problem and not a model problem. HTML5 mixes the two. For backwards compatibility in scripted browser environments, the HTML DOM can't behave exactly like the XHTML5 DOM. For non-scripted non- browser environments, using an XML data model (XML DOM, XOM, JDOM, dom4j, SAX, ElementTree, lxml, etc., etc.) works fine. There are tools that convert quite a lot of text/html pages (whose compliance is user-defined to be it works in my browser) to an XML stream today NeckoHTML is one of them. The goal would be to formalize this parsing, and just this parsing. Like NekoHTML and TagSoup, the Validator.nu HTML parser turns text/ html input into Java XML models. The difference is that the Validator.nu HTML parser implements the HTML5 algorithm instead of something the authors of NekoHTML and TagSoup figured out on their own. So if you are asking for a NekoHTML-like product for HTML5, it already exists and supports three popular Java XML APIs (SAX, DOM and XOM). Not XNI, though, at the moment. (It doesn't support the recent MathML addition, *yet*, though.) http://about.validator.nu/htmlparser/ Currently HTML5 defines at the same time parsing and the model and this is what can cause us to expect that XML is getting weaker. I believe that the whole model-definition work of XML is rich, has many libraries, has empowered a lot of great developments and it is a bad idea to drop it instead of enriching it. The dominant design of non-browser HTML5 parsing libraries is exposing the document tree using an XML parser API. The non-browser HTML5 libraries, therefore, plug into the network of XML libraries. For example, Validator.nu's internals operate on SAX events that look like SAX events for an XHTML5 document. This allows Validator.nu to use libraries written for XML, such as oNVDL and Saxon. So, except for needing yet another XHTML version to accomodate all wishes, I think it would be much saner that browsers' implementations and related specifications rely on an XML-based model of HTML (as the DOM is) instead of a coupled parsing-and- modelling specification which has different interpretations at different places. HTML5 already specifies parsing in terms of DOM output. However, when the DOM is in the HTML mode, it has to be slightly different. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
On Wed, 16 Apr 2008 18:36:49 +0200, William F Hammond [EMAIL PROTECTED] wrote: About 7 years ago there was argument in these circles about whether correct xhtml+mathml could be served as text/html. As we all know, a clear boundary was drawn, presumably because it was onerous for browsers to sniff incoming content and then decide how to parse. Actually, it was not the browsers: http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html As things have evolved, we now know that browsers do, in fact, perform a lot of triage. See, for example, Mozilla's DOCTYPE sniffing, http://developer.mozilla.org/en/docs/Mozilla's_DOCTYPE_sniffing That's a very limited set of differences which mostly affect page layout. Especially since we are speaking about dual serialization of the same DOM and since there is relatively little use of application/xhtml+xml (and some significant user agents do not support it), might it not be worthwhile to re-examine the question of serving standards-compliant xhtml or xhtml+(mathml|svg) serialized document instances as either text/html or application/xhtml+xml? In other words, why not be able to serve both serializations as text/html? What obstacles to this exist? The Web. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
On Apr 16, 2008, at 9:36 AM, William F Hammond wrote: About 7 years ago there was argument in these circles about whether correct xhtml+mathml could be served as text/html. As we all know, a clear boundary was drawn, presumably because it was onerous for browsers to sniff incoming content and then decide how to parse. As things have evolved, we now know that browsers do, in fact, perform a lot of triage. See, for example, Mozilla's DOCTYPE sniffing, http://developer.mozilla.org/en/docs/Mozilla's_DOCTYPE_sniffing Especially since we are speaking about dual serialization of the same DOM and since there is relatively little use of application/xhtml+xml (and some significant user agents do not support it), might it not be worthwhile to re-examine the question of serving standards-compliant xhtml or xhtml+(mathml|svg) serialized document instances as either text/html or application/xhtml+xml? In other words, why not be able to serve both serializations as text/html? What obstacles to this exist? It's not entirely clear what your proposal is, but I assume you are suggesting that content served as text/html with an XHTML doctype declaration should be parsed as XML. The obstacle to this is that much text/html content has an XHTML doctype declaration but depends on being parsed and otherwise processed as HTML, not XML, as current user agents do it. Such content is fairly widespread due to the legacy of Appendix C. It is preferable to let the MIME type continue to be the switch rather than making the doctype serve this role. An additional obstacle in the case of HTML5 is that the XML serialization does not have a distinct doctype (they may use the common HTML5 doctype or no doctype at all, which when parsed as text/ html would be treated as an HTML document in quirks mode). Regards, Maciej
Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
Dnia 10-04-2008, Cz o godzinie 09:51 +, Ian Hickson pisze: On Sat, 4 Nov 2006, Paul Topping wrote: Elements whose namespaces aren't known should be handled like any other unknown HTML element. I believe the common way for user agents to handle an unknown element is basically to ignore the tag and its attributes and treat any text between start and end tags as if the tags weren't there. Namespaces do not present any new challenge in this area. Bogus namespaces are no more of a security risk than bogus HTML tags. It is only the ones that ARE processed by the user agent that represent potential security risks. The problem is legacy content like: html foo xmlns=bogus namespace ...rest of HTML document... We don't want to make the whole document get ignored. An example of such a tag is Microsoft HTML application indicator which is empty by design. But how does Paul’s recipe amount to ignoring the whole document? If anyone is actually reading this 3363 line e-mail, I'm impressed. Please do let me know that you read this. I do not do bungee jumping though.
Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
On Wed, 16 Apr 2008 22:01:49 +0200, William F Hammond [EMAIL PROTECTED] wrote: Anne van Kesteren [EMAIL PROTECTED] writes: The Web. Really!?! Yes, see for instance: http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html It's time for user agents to stop supporting bogus document preambles. Please keep the discussion realistic. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
On Tue, 15 Apr 2008, Chris Chiasson wrote: So, have the HTML 5 people already made up their minds, where the discussion that continues today has no chance of maintaining the XML serialization? Nothing is yet set in stone for HTML5 [1]. The XHTML variant of HTML isn't going away, though. The HTML5 spec defines both a text/html serialisation and an XHTML serialisation, as well as the processing requirements and conformance requirements for both. Does that answer your question? ([1] Well, sort-of nothing. As more and more of HTML5 gets implemented, more and more of our constraint to not break backwards compatibility marches forward to include stuff that several years ago counted as new in HTML5.) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Supporting MathML and SVG in text/html, and related topics
On Thursday 10th April 2008, Ian Hickson wrote: SVG radicals aren't typographically acceptable either. You really want to use fonts for this. Current browsers are clearly better at rendering TrueType and PostScript fonts at small sizes than equivalent shapes expressed as SVG paths. (This may or may not or may only in part be related to hinting as I never tested this using hinted and unhinted versions of the same font, but I suspect that hinting does not account for everything.) Poor or even abysmal on-screen rendering made me abandon this approach last time I considered maths-to-SVG conversion. This particular problem would however be less of an issue with bigger and/or more geometrical shapes, and I would consider TeX's construction of, e.g., vincula (horizontal lines) by overstriking of tiny bits from a font to be an artifact of not being able to intermix text and graphics freely rather than to result from intrinsic aesthetic superiority of the `everything-from-fonts' approach. Now that Safari for Mac supports custom fonts using @font-face and other browsers will follow suit, using fonts for text and operators and SVG graphics for big delimiters and geometric symbols would seem to be a reasonable approach, and I would be interested to know what might make SVG radicals `typographically unacceptable'. (Obviously, fonts and SVG elements must be coordinated.) http://coq.no/musica/it illustrates the concept for musical notation (SVG lines to draw the staves combined with a font for the clefs and accidentals), and I think SVG would also be appropriate for ties in musical notation, bonds in chemical 2D formulae, c. to achieve high-quality, typographically sound rendering. Am I na�vely overlooking an inherent problem with SVG? -- �istein E. Andersen