Re: [webkit-dev] HTML5 Parsing & MathML
On Wed, Nov 3, 2010 at 7:49 AM, David Carlisle wrote: > >> It would have been nice if MathML 3 had a "foreign token" element or >> indication via attribute typing so that we'd know that there is some >> kind of non-MathML content children that should be rendering according >> to the host language. > > But elsewhere you argue that such an element isn't needed and you should just > be > able to drop in other namespaced elements anywhere. in fact MathML3-in-(x)html > does specify such an element, namely content of mo mi mtext are specified as > being html. > Sure. ...didn't win that one! :) We have these token categories: * identifier (mi) * number (mn) * operator (mo) * text (mtext) * space (mspace) * string (ms) What if our use of some chunk of HTML doesn't fit in the categorization of the above? I would have been nice to have an ability to annotate "foreign markup" as some kind of layout element implemented in, say, HTML, and then potentially use embedded additional MathML for inner constructs. That way, things like accessibility would know that the foreign markup isn't a terminal structure of the Mathematics and might know (e.g. via ARIA) the role of the layout. ...so, that's what I meant. Just an idea ... >> We'll now have to have some kind of de-facto >> default set of rules that say that mixed content within a MathML is >> identified and handled slightly differently (especially if it contains >> things like SVG). > > differently to what? Sorry I'm not sure I understand what you mean here, can't > you just always view the content of mtext as inline html: it basically has the > same content model as the content of an html span. SVG is allowed there just > because it's allowed in any inline html. Right. That's not different from what we'd expect. In section 3.2.1, it says: "Token elements (other than mspace) should be rendered as their content, if any, (i.e. in the visual case, as a closely-spaced horizontal row of standard glyphs for the characters or images for the mglyphs in their content)." Introduce a few SVG and HTML elements and then you have to make the assumptions about the children that are being rendered according to the normal rules (plus mglyph) so that this works: xyzzy Without any CSS, that 'div' will be a block whose rendering will cause a new block to be stacked within the inline. That's a consequence of my choice of using inline blocks and allowing the rendering of the 'div' to default to the current internal style within WebKit. I think that's the right choice but there might be other interpretations. For example, one could say that divs inside MathML have a display property of inline-block by default. That choice isn't covered by either MathML3 nor HTML5. I'm not sure it should be. >> That is, we'll need to detect things like: > >> random text ... more random text > > as above i don't see why you need to "detect" such things any more than you > need > to "detect" Well, that's a consequence of building the rendering tree. Right now we don't have a special rendering object for token elements other than for 'mo'. In the case of operators, this becomes complicated due to operator stretching. It may work out to be straightforward but those feel like famous last words. That's all I meant. > > In fact your original proposal was that . should just > work, and so what is to stop mtext being treated exactly like span? > No much and hopefully it stays that way. At this point I'm not raising any issue except that I know that our 'mo' implementation is currently broken in this regard. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
Alex, > Uncle! This will take some work to get working correctly with the > implementation in WebKit. Sorry about that. > Right now, in XHTML documents with MathML, > we get non-token XHTML for free. Within MathML token elements, this > won't necessarily be the case. For example, the 'mo' element renderer > as currently implemented won't preserve child rendering objects. > We'll need to detect these situations and decide what to do. Hmm, the mathml3 spec particularly recommends mtext as the extension point although I think it made sense to specify all the token elements for the parser, to switch to html rendering as it's much easier for validation or convention to restrict the document type than to extend the parser later. > It would have been nice if MathML 3 had a "foreign token" element or > indication via attribute typing so that we'd know that there is some > kind of non-MathML content children that should be rendering according > to the host language. But elsewhere you argue that such an element isn't needed and you should just be able to drop in other namespaced elements anywhere. in fact MathML3-in-(x)html does specify such an element, namely content of mo mi mtext are specified as being html. > We'll now have to have some kind of de-facto > default set of rules that say that mixed content within a MathML is > identified and handled slightly differently (especially if it contains > things like SVG). differently to what? Sorry I'm not sure I understand what you mean here, can't you just always view the content of mtext as inline html: it basically has the same content model as the content of an html span. SVG is allowed there just because it's allowed in any inline html. Clearly if you are looking up the content of mo in an operator dictionary that will only succeed if the mo only contains character data, but even if the mo does contain character data the dictionary lookup will fail in general if you have a finite dictionary and an arbitrary string as the content of the mo, so having it fail on mixed content isn't (in the abstract) any different, although I accept that an implementation may have different concerns. > That is, we'll need to detect things like: > random text ... more random text as above i don't see why you need to "detect" such things any more than you need to "detect" random text ... more random text In fact your original proposal was that . should just work, and so what is to stop mtext being treated exactly like span? David ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle wrote: > > Personally I agree with you that this desire to make html elements forcibly > close the surrounding math elements is entirely bogus, and it causes all sorts > of problems in annotation-xml (where you really want nested html) but we > failed > to convince the html WG (or the html editor) of that and so ended up with a > special case workaround for annotation-xml > > http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887#c16 > > sometimes you have to take what you can get:-) I will take a look. > > However I don't agree that using the token elements as extension points is > only > necessary because of html parser strangeness, I think it leads to a cleaner > design, and better fallback behaviour for systems that do not understand the > foreign elements, in any case. > Uncle! This will take some work to get working correctly with the implementation in WebKit. Right now, in XHTML documents with MathML, we get non-token XHTML for free. Within MathML token elements, this won't necessarily be the case. For example, the 'mo' element renderer as currently implemented won't preserve child rendering objects. We'll need to detect these situations and decide what to do. It would have been nice if MathML 3 had a "foreign token" element or indication via attribute typing so that we'd know that there is some kind of non-MathML content children that should be rendering according to the host language. We'll now have to have some kind of de-facto default set of rules that say that mixed content within a MathML is identified and handled slightly differently (especially if it contains things like SVG). That is, we'll need to detect things like: random text ... more random text While this example is rather pathological, it is still possible and should render as a stack of line boxes wrapped in the inline-block for the 'mo'. Also, this: ... should be equivalent to the XHTML chunk: ... Both of the above examples should work today but once we implement the renderers for mtext/mi/mn etc. we'll need to take this "foreign element" rendering into account. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
On Tue, Nov 2, 2010 at 10:17 AM, Adam Barth wrote: > On Tue, Nov 2, 2010 at 9:52 AM, David Carlisle > wrote: > > On 2 November 2010 16:26, Adam Barth wrote: > >> On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle > wrote: > >>> Alex Milowski milowski.org> writes: > >>> > >>> sorry for late reply, I'm not subscribed, just saw this in the > archives. > >>> > > On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth webkit.org> > wrote: > > Our parser follows the spec (modulo late-breaking spec changes that > we > >>> > >>> Actually most mathml in the wild will be mis-parsed by the webkit html5 > parser > >>> because of > >>> > >>> https://bugs.webkit.org/show_bug.cgi?id=48105 > >>> > >>> but that's hopefully a temporary glitch. > >> > >> Is this a bug in the HTML5 specification or a bug in our > >> implementation of the spec? If its the former, you might want to file > >> a bug with the HTML working group to resolve the issue. > >> > >> Adam > > > > I'm pretty sure that it is an implementation issue (firefox 4 doesn't > > have this problem for example). Certainly I can't see anything that > > would specify parsing something as simple as > > > > > > > > > > 1 > > a > > > > > > > > as a completely different tree: > > > > 1 a > > > > It makes mathml and svg pretty unusable of course as it's common (very > > common in mathml case) to have elements nested within an element of > > the same name. > > Okiedokes. I've CCed Eric and myself on the bug since we're the > mostly likely folks to fix the issue. We'd certainly welcome a patch > from you, if you're interested in fixing the issue. This is my bug from the new handling of foreign content mode. I'll upload a patch shortly. James ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
On Tue, Nov 2, 2010 at 9:52 AM, David Carlisle wrote: > On 2 November 2010 16:26, Adam Barth wrote: >> On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle >> wrote: >>> Alex Milowski milowski.org> writes: >>> >>> sorry for late reply, I'm not subscribed, just saw this in the archives. >>> On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth webkit.org> wrote: > Our parser follows the spec (modulo late-breaking spec changes that we >>> >>> Actually most mathml in the wild will be mis-parsed by the webkit html5 >>> parser >>> because of >>> >>> https://bugs.webkit.org/show_bug.cgi?id=48105 >>> >>> but that's hopefully a temporary glitch. >> >> Is this a bug in the HTML5 specification or a bug in our >> implementation of the spec? If its the former, you might want to file >> a bug with the HTML working group to resolve the issue. >> >> Adam > > I'm pretty sure that it is an implementation issue (firefox 4 doesn't > have this problem for example). Certainly I can't see anything that > would specify parsing something as simple as > > > > > 1 > a > > > > as a completely different tree: > > 1 a > > It makes mathml and svg pretty unusable of course as it's common (very > common in mathml case) to have elements nested within an element of > the same name. Okiedokes. I've CCed Eric and myself on the bug since we're the mostly likely folks to fix the issue. We'd certainly welcome a patch from you, if you're interested in fixing the issue. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
It seems to be the later. This is indeed a regression, but I don't know how to detect when it appeared. From my memory, it was OK a few months ago. François Le 2 nov. 2010 à 17:26, Adam Barth a écrit : > On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle wrote: >> Alex Milowski milowski.org> writes: >> >> sorry for late reply, I'm not subscribed, just saw this in the archives. >> >>> >>> On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth webkit.org> wrote: Our parser follows the spec (modulo late-breaking spec changes that we >> >> Actually most mathml in the wild will be mis-parsed by the webkit html5 >> parser >> because of >> >> https://bugs.webkit.org/show_bug.cgi?id=48105 >> >> but that's hopefully a temporary glitch. > > Is this a bug in the HTML5 specification or a bug in our > implementation of the spec? If its the former, you might want to file > a bug with the HTML working group to resolve the issue. > > Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle wrote: > Alex Milowski milowski.org> writes: > > sorry for late reply, I'm not subscribed, just saw this in the archives. > >> >> On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth webkit.org> wrote: >> > Our parser follows the spec (modulo late-breaking spec changes that we > > Actually most mathml in the wild will be mis-parsed by the webkit html5 parser > because of > > https://bugs.webkit.org/show_bug.cgi?id=48105 > > but that's hopefully a temporary glitch. Is this a bug in the HTML5 specification or a bug in our implementation of the spec? If its the former, you might want to file a bug with the HTML working group to resolve the issue. Adam >> > haven't picked up yet). The different namespaces can only be nested >> > in certain ways, unlike in XML where arbitrary nesting is possible. >> >> ... >> >> ... >> >> only > necessary because of html parser strangeness, I think it leads to a cleaner > design, and better fallback behaviour for systems that do not understand the > foreign elements, in any case. > >> >> In XHTML, assuming there are appropriate uses of >> namespaces, everything would work fine and you'd get a "div" >> element fenced with stretching square brackets. > > It would probably render OK but wouldn't be valid according to the published > schemas. As with most "polyglot" requirements assuming xml and html validity > goes a log way to ensuring that you get the same dom. >> >> So, if you cut-n-pasted the same content with the 'xmlns' >> attributes, you'd get two very different results. >> >> That really feels "fixable" but I'm going to need to think a bit >> more about what adjustments there would need to be >> to the rules. >> >> I wonder what the intersection of local names is between >> MathML and HTML ... > > By design there is no intersection, although it turns out that browsers > implemented (and html5 acknowledges) image as a synonym for img which is > therefore the one clash with a mathml name. > >> >> This is, of course, an HTML5 issue and not really an WebKit >> issue except for the question of difficulty of implementation. >> > > yep. > > David > > > > > ___ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev > ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
Alex Milowski milowski.org> writes: sorry for late reply, I'm not subscribed, just saw this in the archives. > > On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth webkit.org> wrote: > > Our parser follows the spec (modulo late-breaking spec changes that we Actually most mathml in the wild will be mis-parsed by the webkit html5 parser because of https://bugs.webkit.org/show_bug.cgi?id=48105 but that's hopefully a temporary glitch. > > haven't picked up yet). The different namespaces can only be nested > > in certain ways, unlike in XML where arbitrary nesting is possible. > > ... > > ... > > > In XHTML, assuming there are appropriate uses of > namespaces, everything would work fine and you'd get a "div" > element fenced with stretching square brackets. It would probably render OK but wouldn't be valid according to the published schemas. As with most "polyglot" requirements assuming xml and html validity goes a log way to ensuring that you get the same dom. > > So, if you cut-n-pasted the same content with the 'xmlns' > attributes, you'd get two very different results. > > That really feels "fixable" but I'm going to need to think a bit > more about what adjustments there would need to be > to the rules. > > I wonder what the intersection of local names is between > MathML and HTML ... By design there is no intersection, although it turns out that browsers implemented (and html5 acknowledges) image as a synonym for img which is therefore the one clash with a mathml name. > > This is, of course, an HTML5 issue and not really an WebKit > issue except for the question of difficulty of implementation. > yep. David ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
On Mon, Oct 4, 2010 at 10:03 AM, Alex Milowski wrote: > On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth wrote: >> Our parser follows the spec (modulo late-breaking spec changes that we >> haven't picked up yet). The different namespaces can only be nested >> in certain ways, unlike in XML where arbitrary nesting is possible. > > Actually, I don't think a MathML annotation-xml with an SVG child element > is going to work properly in WebKit the way the current code is > setup. I'll test that. > > This is place where I think the parsing rules for HTML5 need to be > adjusted so we get the same results for HTML or SVG embedded > in MathML regardless of HTML or XHTML syntax. Digging deeper > into what the HTML5 specification says for "foreign content", the > HTML "div" element would generate a parse error: > > ... > > element fenced with stretching square brackets. > > So, if you cut-n-pasted the same content with the 'xmlns' > attributes, you'd get two very different results. > > That really feels "fixable" but I'm going to need to think a bit > more about what adjustments there would need to be > to the rules. > > I wonder what the intersection of local names is between > MathML and HTML ... > > This is, of course, an HTML5 issue and not really an WebKit > issue except for the question of difficulty of implementation. If you have feedback on the HTML5 working group, it's probably a good idea to convey that to the working group sooner rather than later. The HTML5 specification is close to Last Call, after which it will be more difficult to make changes. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth wrote: > Our parser follows the spec (modulo late-breaking spec changes that we > haven't picked up yet). The different namespaces can only be nested > in certain ways, unlike in XML where arbitrary nesting is possible. Actually, I don't think a MathML annotation-xml with an SVG child element is going to work properly in WebKit the way the current code is setup. I'll test that. This is place where I think the parsing rules for HTML5 need to be adjusted so we get the same results for HTML or SVG embedded in MathML regardless of HTML or XHTML syntax. Digging deeper into what the HTML5 specification says for "foreign content", the HTML "div" element would generate a parse error: ... http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
On Sat, Oct 2, 2010 at 3:48 PM, David Carlisle wrote: > Alex Milowski milowski.org> writes: > >> >From reading the section on "in foreign content' [1], it would seem that it >> should assign the 'svg' elements to the MathML namespace when they >> are embedded as above. That means cutting and pasting the same >> content fragment gives two very different interpretations--which is more >> of a problem for the HTML5 spec than webkit. >> > > As (since?) confirmed elsewhere on another list, but mentioned here for the > record, the example becomes valid (and parse-able by html5 parser) if you wrap > the svg in mi elements. > the presentation mathml token elements, mi, mtext, etc are specified as being > the extension points where you can embed html (and thus svg). That presents a challenge to the way the MathML implementation is current organized. In the current implementation, token elements are not suppose to contain element content. We'll need to completely re-architect the token elements to handle this in all situations as we won't get it by default in several cases. For example, if the SVG is embedded in an 'mo' element, the SVG will be ignored. That also questions what should be done in cases like: random text I still stand by my position that wrapping "foreign" elements in token elements in MathML is completely unnecessary for SVG, HTML, or other vocabularies that have rendering semantics that translate into some sequence of inline or block boxes. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
Alex Milowski milowski.org> writes: > >From reading the section on "in foreign content' [1], it would seem that it > should assign the 'svg' elements to the MathML namespace when they > are embedded as above. That means cutting and pasting the same > content fragment gives two very different interpretations--which is more > of a problem for the HTML5 spec than webkit. > As (since?) confirmed elsewhere on another list, but mentioned here for the record, the example becomes valid (and parse-able by html5 parser) if you wrap the svg in mi elements. the presentation mathml token elements, mi, mtext, etc are specified as being the extension points where you can embed html (and thus svg). David ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] HTML5 Parsing & MathML
Our parser follows the spec (modulo late-breaking spec changes that we haven't picked up yet). The different namespaces can only be nested in certain ways, unlike in XML where arbitrary nesting is possible. Adam On Fri, Oct 1, 2010 at 12:46 PM, Alex Milowski wrote: > I'm curious as to what the current HTML5 parser does when MathML > and SVG are mixed. In a recent review of MathML 3 I made the comment > that this kind of markup, in XHTML, works just fine: > > > > happy smilely.. > > happy smilely.. > + > unhappy smilely.. > > > > > Yet, it does not seem to be blessed by MathML3 as valid or allowed. I'm > still debating that restriction with others. Note that the svg elements would > need to have the SVG namespace declared and used somehow. > > In HTML5, the namespace declarations and use for MathML or > SVG wouldn't be required (or really allowed). > > > > > happy smilely.. > > happy smilely.. > + > unhappy smilely.. > > > > > > So, my question is: What does webkit do? > > >From reading the section on "in foreign content' [1], it would seem that it > should assign the 'svg' elements to the MathML namespace when they > are embedded as above. That means cutting and pasting the same > content fragment gives two very different interpretations--which is more > of a problem for the HTML5 spec than webkit. > > I've looked at HTMLTreeBuilder.cpp and it looks like it will assign > all child elements to the MathML namespace. That's not 100% > correct as the 'annotation-xml' has a special case for SVG > annotations. > > >From a browser perspective, when you are using the XHTML syntax, > you can compose elements from different namespaces--embedding > them as above--and it does work. In fact, the above kind of embedding > of SVG as direct actors in rendering a fraction works right now in > WebKit. That is, of course, an unsanctioned behavior. > > [1] http://dev.w3.org/html5/spec/Overview.html#parsing-main-inforeign > > -- > --Alex Milowski > "The excellence of grammar as a guide is proportional to the paucity of the > inflexions, i.e. to the degree of analysis effected by the language > considered." > > Bertrand Russell in a footnote of Principles of Mathematics > ___ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev > ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
[webkit-dev] HTML5 Parsing & MathML
I'm curious as to what the current HTML5 parser does when MathML and SVG are mixed. In a recent review of MathML 3 I made the comment that this kind of markup, in XHTML, works just fine: happy smilely.. happy smilely.. + unhappy smilely.. Yet, it does not seem to be blessed by MathML3 as valid or allowed. I'm still debating that restriction with others. Note that the svg elements would need to have the SVG namespace declared and used somehow. In HTML5, the namespace declarations and use for MathML or SVG wouldn't be required (or really allowed). happy smilely.. happy smilely.. + unhappy smilely.. So, my question is: What does webkit do? >From reading the section on "in foreign content' [1], it would seem that it should assign the 'svg' elements to the MathML namespace when they are embedded as above. That means cutting and pasting the same content fragment gives two very different interpretations--which is more of a problem for the HTML5 spec than webkit. I've looked at HTMLTreeBuilder.cpp and it looks like it will assign all child elements to the MathML namespace. That's not 100% correct as the 'annotation-xml' has a special case for SVG annotations. >From a browser perspective, when you are using the XHTML syntax, you can compose elements from different namespaces--embedding them as above--and it does work. In fact, the above kind of embedding of SVG as direct actors in rendering a fraction works right now in WebKit. That is, of course, an unsanctioned behavior. [1] http://dev.w3.org/html5/spec/Overview.html#parsing-main-inforeign -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev