Re: [webkit-dev] HTML5 Parsing amp; MathML
On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote: Personally I agree with you that this desire to make html elements forcibly close the surrounding math elements is entirely bogus, and it causes all sorts of problems in annotation-xml (where you really want nested html) but we failed to convince the html WG (or the html editor) of that and so ended up with a special case workaround for annotation-xml http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887#c16 sometimes you have to take what you can get:-) I will take a look. However I don't agree that using the token elements as extension points is only necessary because of html parser strangeness, I think it leads to a cleaner design, and better fallback behaviour for systems that do not understand the foreign elements, in any case. Uncle! This will take some work to get working correctly with the implementation in WebKit. Right now, in XHTML documents with MathML, we get non-token XHTML for free. Within MathML token elements, this won't necessarily be the case. For example, the 'mo' element renderer as currently implemented won't preserve child rendering objects. We'll need to detect these situations and decide what to do. It would have been nice if MathML 3 had a foreign token element or indication via attribute typing so that we'd know that there is some kind of non-MathML content children that should be rendering according to the host language. We'll now have to have some kind of de-facto default set of rules that say that mixed content within a MathML is identified and handled slightly differently (especially if it contains things like SVG). That is, we'll need to detect things like: mathmo random text svg ... /svg more random text/mo/math While this example is rather pathological, it is still possible and should render as a stack of line boxes wrapped in the inline-block for the 'mo'. Also, this: mathmtext div .../div /mtext/math should be equivalent to the XHTML chunk: math xmlns='http://www.w3.org/1998/Math/MathML/'div xmlns='http://www.w3.org/1999/xhtml'.../div/math Both of the above examples should work today but once we implement the renderers for mtext/mi/mn etc. we'll need to take this foreign element rendering into account. -- --Alex Milowski The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered. Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
Alex, Uncle! This will take some work to get working correctly with the implementation in WebKit. Sorry about that. Right now, in XHTML documents with MathML, we get non-token XHTML for free. Within MathML token elements, this won't necessarily be the case. For example, the 'mo' element renderer as currently implemented won't preserve child rendering objects. We'll need to detect these situations and decide what to do. Hmm, the mathml3 spec particularly recommends mtext as the extension point although I think it made sense to specify all the token elements for the parser, to switch to html rendering as it's much easier for validation or convention to restrict the document type than to extend the parser later. It would have been nice if MathML 3 had a foreign token element or indication via attribute typing so that we'd know that there is some kind of non-MathML content children that should be rendering according to the host language. But elsewhere you argue that such an element isn't needed and you should just be able to drop in other namespaced elements anywhere. in fact MathML3-in-(x)html does specify such an element, namely content of mo mi mtext are specified as being html. We'll now have to have some kind of de-facto default set of rules that say that mixed content within a MathML is identified and handled slightly differently (especially if it contains things like SVG). differently to what? Sorry I'm not sure I understand what you mean here, can't you just always view the content of mtext as inline html: it basically has the same content model as the content of an html span. SVG is allowed there just because it's allowed in any inline html. Clearly if you are looking up the content of mo in an operator dictionary that will only succeed if the mo only contains character data, but even if the mo does contain character data the dictionary lookup will fail in general if you have a finite dictionary and an arbitrary string as the content of the mo, so having it fail on mixed content isn't (in the abstract) any different, although I accept that an implementation may have different concerns. That is, we'll need to detect things like: mathmo random text svg ... /svg more random text/mo/math as above i don't see why you need to detect such things any more than you need to detect span random text svg ... /svg more random text/span In fact your original proposal was that mathspan.svg should just work, and so what is to stop mtext being treated exactly like span? David ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
On Wed, Nov 3, 2010 at 7:49 AM, David Carlisle d.p.carli...@gmail.com wrote: It would have been nice if MathML 3 had a foreign token element or indication via attribute typing so that we'd know that there is some kind of non-MathML content children that should be rendering according to the host language. But elsewhere you argue that such an element isn't needed and you should just be able to drop in other namespaced elements anywhere. in fact MathML3-in-(x)html does specify such an element, namely content of mo mi mtext are specified as being html. Sure. ...didn't win that one! :) We have these token categories: * identifier (mi) * number (mn) * operator (mo) * text (mtext) * space (mspace) * string (ms) What if our use of some chunk of HTML doesn't fit in the categorization of the above? I would have been nice to have an ability to annotate foreign markup as some kind of layout element implemented in, say, HTML, and then potentially use embedded additional MathML for inner constructs. That way, things like accessibility would know that the foreign markup isn't a terminal structure of the Mathematics and might know (e.g. via ARIA) the role of the layout. ...so, that's what I meant. Just an idea ... We'll now have to have some kind of de-facto default set of rules that say that mixed content within a MathML is identified and handled slightly differently (especially if it contains things like SVG). differently to what? Sorry I'm not sure I understand what you mean here, can't you just always view the content of mtext as inline html: it basically has the same content model as the content of an html span. SVG is allowed there just because it's allowed in any inline html. Right. That's not different from what we'd expect. In section 3.2.1, it says: Token elements (other than mspace) should be rendered as their content, if any, (i.e. in the visual case, as a closely-spaced horizontal row of standard glyphs for the characters or images for the mglyphs in their content). Introduce a few SVG and HTML elements and then you have to make the assumptions about the children that are being rendered according to the normal rules (plus mglyph) so that this works: mi xyzzy div /div /mi Without any CSS, that 'div' will be a block whose rendering will cause a new block to be stacked within the inline. That's a consequence of my choice of using inline blocks and allowing the rendering of the 'div' to default to the current internal style within WebKit. I think that's the right choice but there might be other interpretations. For example, one could say that divs inside MathML have a display property of inline-block by default. That choice isn't covered by either MathML3 nor HTML5. I'm not sure it should be. That is, we'll need to detect things like: mathmo random text svg ... /svg more random text/mo/math as above i don't see why you need to detect such things any more than you need to detect Well, that's a consequence of building the rendering tree. Right now we don't have a special rendering object for token elements other than for 'mo'. In the case of operators, this becomes complicated due to operator stretching. It may work out to be straightforward but those feel like famous last words. That's all I meant. In fact your original proposal was that mathspan.svg should just work, and so what is to stop mtext being treated exactly like span? No much and hopefully it stays that way. At this point I'm not raising any issue except that I know that our 'mo' implementation is currently broken in this regard. -- --Alex Milowski The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered. Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
Alex Milowski alex at milowski.org writes: sorry for late reply, I'm not subscribed, just saw this in the archives. On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote: Our parser follows the spec (modulo late-breaking spec changes that we Actually most mathml in the wild will be mis-parsed by the webkit html5 parser because of https://bugs.webkit.org/show_bug.cgi?id=48105 but that's hopefully a temporary glitch. haven't picked up yet). The different namespaces can only be nested in certain ways, unlike in XML where arbitrary nesting is possible. ... p ... math mfenced open='[ close=] div ... random stuff /div /mfenced /math /p It would then pop the open stack back to the parent p element and the div element would be a child of the paragraph and not of the fencing. Personally I agree with you that this desire to make html elements forcibly close the surrounding math elements is entirely bogus, and it causes all sorts of problems in annotation-xml (where you really want nested html) but we failed to convince the html WG (or the html editor) of that and so ended up with a special case workaround for annotation-xml http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887#c16 sometimes you have to take what you can get:-) However I don't agree that using the token elements as extension points is only necessary because of html parser strangeness, I think it leads to a cleaner design, and better fallback behaviour for systems that do not understand the foreign elements, in any case. In XHTML, assuming there are appropriate uses of namespaces, everything would work fine and you'd get a div element fenced with stretching square brackets. It would probably render OK but wouldn't be valid according to the published schemas. As with most polyglot requirements assuming xml and html validity goes a log way to ensuring that you get the same dom. So, if you cut-n-pasted the same content with the 'xmlns' attributes, you'd get two very different results. That really feels fixable but I'm going to need to think a bit more about what adjustments there would need to be to the rules. I wonder what the intersection of local names is between MathML and HTML ... By design there is no intersection, although it turns out that browsers implemented (and html5 acknowledges) image as a synonym for img which is therefore the one clash with a mathml name. This is, of course, an HTML5 issue and not really an WebKit issue except for the question of difficulty of implementation. yep. David ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote: Alex Milowski alex at milowski.org writes: sorry for late reply, I'm not subscribed, just saw this in the archives. On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote: Our parser follows the spec (modulo late-breaking spec changes that we Actually most mathml in the wild will be mis-parsed by the webkit html5 parser because of https://bugs.webkit.org/show_bug.cgi?id=48105 but that's hopefully a temporary glitch. Is this a bug in the HTML5 specification or a bug in our implementation of the spec? If its the former, you might want to file a bug with the HTML working group to resolve the issue. Adam haven't picked up yet). The different namespaces can only be nested in certain ways, unlike in XML where arbitrary nesting is possible. ... p ... math mfenced open='[ close=] div ... random stuff /div /mfenced /math /p It would then pop the open stack back to the parent p element and the div element would be a child of the paragraph and not of the fencing. Personally I agree with you that this desire to make html elements forcibly close the surrounding math elements is entirely bogus, and it causes all sorts of problems in annotation-xml (where you really want nested html) but we failed to convince the html WG (or the html editor) of that and so ended up with a special case workaround for annotation-xml http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887#c16 sometimes you have to take what you can get:-) However I don't agree that using the token elements as extension points is only necessary because of html parser strangeness, I think it leads to a cleaner design, and better fallback behaviour for systems that do not understand the foreign elements, in any case. In XHTML, assuming there are appropriate uses of namespaces, everything would work fine and you'd get a div element fenced with stretching square brackets. It would probably render OK but wouldn't be valid according to the published schemas. As with most polyglot requirements assuming xml and html validity goes a log way to ensuring that you get the same dom. So, if you cut-n-pasted the same content with the 'xmlns' attributes, you'd get two very different results. That really feels fixable but I'm going to need to think a bit more about what adjustments there would need to be to the rules. I wonder what the intersection of local names is between MathML and HTML ... By design there is no intersection, although it turns out that browsers implemented (and html5 acknowledges) image as a synonym for img which is therefore the one clash with a mathml name. This is, of course, an HTML5 issue and not really an WebKit issue except for the question of difficulty of implementation. yep. David ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
It seems to be the later. This is indeed a regression, but I don't know how to detect when it appeared. From my memory, it was OK a few months ago. François Le 2 nov. 2010 à 17:26, Adam Barth a écrit : On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote: Alex Milowski alex at milowski.org writes: sorry for late reply, I'm not subscribed, just saw this in the archives. On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote: Our parser follows the spec (modulo late-breaking spec changes that we Actually most mathml in the wild will be mis-parsed by the webkit html5 parser because of https://bugs.webkit.org/show_bug.cgi?id=48105 but that's hopefully a temporary glitch. Is this a bug in the HTML5 specification or a bug in our implementation of the spec? If its the former, you might want to file a bug with the HTML working group to resolve the issue. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
On Tue, Nov 2, 2010 at 9:52 AM, David Carlisle d.p.carli...@gmail.com wrote: On 2 November 2010 16:26, Adam Barth aba...@webkit.org wrote: On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote: Alex Milowski alex at milowski.org writes: sorry for late reply, I'm not subscribed, just saw this in the archives. On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote: Our parser follows the spec (modulo late-breaking spec changes that we Actually most mathml in the wild will be mis-parsed by the webkit html5 parser because of https://bugs.webkit.org/show_bug.cgi?id=48105 but that's hopefully a temporary glitch. Is this a bug in the HTML5 specification or a bug in our implementation of the spec? If its the former, you might want to file a bug with the HTML working group to resolve the issue. Adam I'm pretty sure that it is an implementation issue (firefox 4 doesn't have this problem for example). Certainly I can't see anything that would specify parsing something as simple as math mrow mrowmn1/mn/mrow mia/mi /mrow /math as a completely different tree: math mrow mrowmn1/mn/mrow/mrow mia/mi /math It makes mathml and svg pretty unusable of course as it's common (very common in mathml case) to have elements nested within an element of the same name. Okiedokes. I've CCed Eric and myself on the bug since we're the mostly likely folks to fix the issue. We'd certainly welcome a patch from you, if you're interested in fixing the issue. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
On Tue, Nov 2, 2010 at 10:17 AM, Adam Barth aba...@webkit.org wrote: On Tue, Nov 2, 2010 at 9:52 AM, David Carlisle d.p.carli...@gmail.com wrote: On 2 November 2010 16:26, Adam Barth aba...@webkit.org wrote: On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote: Alex Milowski alex at milowski.org writes: sorry for late reply, I'm not subscribed, just saw this in the archives. On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote: Our parser follows the spec (modulo late-breaking spec changes that we Actually most mathml in the wild will be mis-parsed by the webkit html5 parser because of https://bugs.webkit.org/show_bug.cgi?id=48105 but that's hopefully a temporary glitch. Is this a bug in the HTML5 specification or a bug in our implementation of the spec? If its the former, you might want to file a bug with the HTML working group to resolve the issue. Adam I'm pretty sure that it is an implementation issue (firefox 4 doesn't have this problem for example). Certainly I can't see anything that would specify parsing something as simple as math mrow mrowmn1/mn/mrow mia/mi /mrow /math as a completely different tree: math mrow mrowmn1/mn/mrow/mrow mia/mi /math It makes mathml and svg pretty unusable of course as it's common (very common in mathml case) to have elements nested within an element of the same name. Okiedokes. I've CCed Eric and myself on the bug since we're the mostly likely folks to fix the issue. We'd certainly welcome a patch from you, if you're interested in fixing the issue. This is my bug from the new handling of foreign content mode. I'll upload a patch shortly. James ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
On Sat, Oct 2, 2010 at 3:48 PM, David Carlisle d.p.carli...@gmail.com wrote: Alex Milowski alex at milowski.org writes: From reading the section on in foreign content' [1], it would seem that it should assign the 'svg' elements to the MathML namespace when they are embedded as above. That means cutting and pasting the same content fragment gives two very different interpretations--which is more of a problem for the HTML5 spec than webkit. As (since?) confirmed elsewhere on another list, but mentioned here for the record, the example becomes valid (and parse-able by html5 parser) if you wrap the svg in mi elements. the presentation mathml token elements, mi, mtext, etc are specified as being the extension points where you can embed html (and thus svg). That presents a challenge to the way the MathML implementation is current organized. In the current implementation, token elements are not suppose to contain element content. We'll need to completely re-architect the token elements to handle this in all situations as we won't get it by default in several cases. For example, if the SVG is embedded in an 'mo' element, the SVG will be ignored. That also questions what should be done in cases like: mo random text svg ... /svg /mo I still stand by my position that wrapping foreign elements in token elements in MathML is completely unnecessary for SVG, HTML, or other vocabularies that have rendering semantics that translate into some sequence of inline or block boxes. -- --Alex Milowski The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered. Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing amp; MathML
Alex Milowski alex at milowski.org writes: From reading the section on in foreign content' [1], it would seem that it should assign the 'svg' elements to the MathML namespace when they are embedded as above. That means cutting and pasting the same content fragment gives two very different interpretations--which is more of a problem for the HTML5 spec than webkit. As (since?) confirmed elsewhere on another list, but mentioned here for the record, the example becomes valid (and parse-able by html5 parser) if you wrap the svg in mi elements. the presentation mathml token elements, mi, mtext, etc are specified as being the extension points where you can embed html (and thus svg). David ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev