Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-03 Thread Alex Milowski
On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote:

 Personally I agree with you that this desire to make html elements forcibly
 close the surrounding math elements is entirely bogus, and it causes all sorts
 of problems in annotation-xml (where you really want nested html) but we 
 failed
 to convince the html WG (or the html editor) of that and so ended up with a
 special case workaround for annotation-xml

 http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887#c16

 sometimes you have to take what you can get:-)

I will take a look.


 However I don't agree that using the token elements as extension points is 
 only
 necessary because of html parser strangeness, I think it leads to a cleaner
 design, and better fallback behaviour for systems that do not understand the
 foreign elements, in any case.


Uncle!  This will take some work to get working correctly with the
implementation in WebKit.  Right now, in XHTML documents with MathML,
we get non-token XHTML for free.  Within MathML token elements, this
won't necessarily be the case.  For example, the 'mo' element renderer
as currently implemented won't preserve child rendering objects.
We'll need to detect these situations and decide what to do.

It would have been nice if MathML 3 had a foreign token element or
indication via attribute typing so that we'd know that there is some
kind of non-MathML content children that should be rendering according
to the host language.  We'll now have to have some kind of de-facto
default set of rules that say that mixed content within a MathML is
identified and handled slightly differently (especially if it contains
things like SVG).

That is, we'll need to detect things like:

mathmo random text svg ... /svg  more random text/mo/math

While this example is rather pathological, it is still possible and
should render as a stack of line boxes wrapped in the inline-block for
the 'mo'.

Also, this:

mathmtext div .../div /mtext/math

should be equivalent to the XHTML chunk:

math xmlns='http://www.w3.org/1998/Math/MathML/'div
xmlns='http://www.w3.org/1999/xhtml'.../div/math

Both of the above examples should work today but once we implement the
renderers for mtext/mi/mn etc. we'll need to take this foreign
element rendering into account.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-03 Thread David Carlisle


Alex,

 Uncle!  This will take some work to get working correctly with the
 implementation in WebKit.


Sorry about that.

   Right now, in XHTML documents with MathML,
 we get non-token XHTML for free.  Within MathML token elements, this
 won't necessarily be the case.  For example, the 'mo' element renderer
 as currently implemented won't preserve child rendering objects.
 We'll need to detect these situations and decide what to do.

Hmm, the mathml3 spec particularly recommends mtext as the extension point
although I think it made sense to specify all the token elements for the parser,
to switch to html rendering as it's much easier for validation or convention to
restrict the document type than to extend the parser later.

 It would have been nice if MathML 3 had a foreign token element or
 indication via attribute typing so that we'd know that there is some
 kind of non-MathML content children that should be rendering according
 to the host language.

But elsewhere you argue that such an element isn't needed and you should just be
able to drop in other namespaced elements anywhere. in fact MathML3-in-(x)html
does specify such an element, namely content of mo mi mtext are specified as
being html.

 We'll now have to have some kind of de-facto
 default set of rules that say that mixed content within a MathML is
 identified and handled slightly differently (especially if it contains
 things like SVG).

differently to what? Sorry I'm not sure I understand what you mean here, can't
you just always view the content of mtext as inline html: it basically has the
same content model as the content of an html span. SVG is allowed there just
because it's allowed in any inline html.  Clearly if you are looking up the
content of mo in an operator dictionary that will only succeed if the mo only
contains character data, but even if the mo does contain character data the
dictionary lookup will fail in general if you have a finite dictionary and an
arbitrary string as the content of the mo, so having it fail on mixed content
isn't (in the abstract) any different, although I accept that an implementation
may have different concerns.

 That is, we'll need to detect things like:

 mathmo random text svg ... /svg  more random text/mo/math

as above i don't see why you need to detect such things any more than you need
to detect


span random text svg ... /svg  more random text/span


In fact your original proposal was that mathspan.svg should just
work, and so what is to stop mtext being treated exactly like span?

David





___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-03 Thread Alex Milowski
On Wed, Nov 3, 2010 at 7:49 AM, David Carlisle d.p.carli...@gmail.com wrote:

 It would have been nice if MathML 3 had a foreign token element or
 indication via attribute typing so that we'd know that there is some
 kind of non-MathML content children that should be rendering according
 to the host language.

 But elsewhere you argue that such an element isn't needed and you should just 
 be
 able to drop in other namespaced elements anywhere. in fact MathML3-in-(x)html
 does specify such an element, namely content of mo mi mtext are specified as
 being html.


Sure. ...didn't win that one!  :)

We have these token categories:

   * identifier (mi)
   * number (mn)
   * operator (mo)
   * text (mtext)
   * space (mspace)
   * string (ms)

What if our use of some chunk of HTML doesn't fit in the
categorization of the above?  I would have been nice to have an
ability to annotate foreign markup as some kind of layout element
implemented in, say, HTML, and then potentially use embedded
additional MathML for inner constructs.  That way, things like
accessibility would know that the foreign markup isn't a terminal
structure of the Mathematics and might know (e.g. via ARIA) the role
of the layout.

...so, that's what I meant.   Just an idea ...

 We'll now have to have some kind of de-facto
 default set of rules that say that mixed content within a MathML is
 identified and handled slightly differently (especially if it contains
 things like SVG).

 differently to what? Sorry I'm not sure I understand what you mean here, can't
 you just always view the content of mtext as inline html: it basically has the
 same content model as the content of an html span. SVG is allowed there just
 because it's allowed in any inline html.

Right.  That's not different from what we'd expect.  In section 3.2.1, it says:

Token elements (other than mspace) should be rendered as their
content, if any, (i.e. in the visual case, as a closely-spaced
horizontal row of standard glyphs for the characters or images for the
mglyphs in their content).

Introduce a few SVG and HTML elements and then you have to make the
assumptions about the children that are being rendered according to
the normal rules (plus mglyph) so that this works:

mi xyzzy div  /div /mi

Without any CSS, that 'div' will be a block whose rendering will cause
a new block to be stacked within the inline.  That's a consequence of
my choice of using inline blocks and allowing the rendering of the
'div' to default to the current internal style within WebKit.  I think
that's the right choice but there might be other interpretations.  For
example, one could say that divs inside MathML have a display property
of inline-block by default.

That choice isn't covered by either MathML3 nor HTML5.  I'm not sure
it should be.

 That is, we'll need to detect things like:

 mathmo random text svg ... /svg  more random text/mo/math

 as above i don't see why you need to detect such things any more than you 
 need
 to detect

Well, that's a consequence of building the rendering tree.  Right now
we don't have a special rendering object for token elements other than
for 'mo'.  In the case of operators, this becomes complicated due to
operator stretching.  It may work out to be straightforward but those
feel like famous last words.  That's all I meant.


 In fact your original proposal was that mathspan.svg should just
 work, and so what is to stop mtext being treated exactly like span?


No much and hopefully it stays that way.

At this point I'm not raising any issue except that I know that our
'mo' implementation is currently broken in this regard.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-02 Thread David Carlisle
Alex Milowski alex at milowski.org writes:

sorry for late reply, I'm not subscribed, just saw this in the archives.

 
 On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote:
  Our parser follows the spec (modulo late-breaking spec changes that we

Actually most mathml in the wild will be mis-parsed by the webkit html5 parser
because of

https://bugs.webkit.org/show_bug.cgi?id=48105 

but that's hopefully a temporary glitch.

  haven't picked up yet).  The different namespaces can only be nested
  in certain ways, unlike in XML where arbitrary nesting is possible.
 
 ...
 
 p ...
 math
 mfenced open='[ close=]
 div ... random stuff /div
 /mfenced
 /math
 /p
 
 It would then pop the open stack back to the parent p element
 and the div element would be a child of the paragraph and not
 of the fencing.

Personally I agree with you that this desire to make html elements forcibly
close the surrounding math elements is entirely bogus, and it causes all sorts
of problems in annotation-xml (where you really want nested html) but we failed
to convince the html WG (or the html editor) of that and so ended up with a
special case workaround for annotation-xml

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887#c16

sometimes you have to take what you can get:-)

However I don't agree that using the token elements as extension points is only
necessary because of html parser strangeness, I think it leads to a cleaner
design, and better fallback behaviour for systems that do not understand the
foreign elements, in any case.

 
 In XHTML, assuming there are appropriate uses of
 namespaces, everything would work fine and you'd get a div
 element fenced with stretching square brackets.

It would probably render OK but wouldn't be valid according to the published
schemas. As with most polyglot requirements assuming xml and html validity
goes a log way to ensuring that you get the same dom.
 
 So, if you cut-n-pasted the same content with the 'xmlns'
 attributes, you'd get two very different results.
 
 That really feels fixable but I'm going to need to think a bit
 more about what adjustments there would need to be
 to the rules.
 
 I wonder what the intersection of local names is between
 MathML and HTML ...

By design there is no intersection, although it turns out that browsers
implemented (and html5 acknowledges) image as a synonym for img which is
therefore the one clash with a mathml name.

 
 This is, of course, an HTML5 issue and not really an WebKit
 issue except for the question of difficulty of implementation.
 

yep.

David




___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-02 Thread Adam Barth
On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote:
 Alex Milowski alex at milowski.org writes:

 sorry for late reply, I'm not subscribed, just saw this in the archives.


 On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote:
  Our parser follows the spec (modulo late-breaking spec changes that we

 Actually most mathml in the wild will be mis-parsed by the webkit html5 parser
 because of

 https://bugs.webkit.org/show_bug.cgi?id=48105

 but that's hopefully a temporary glitch.

Is this a bug in the HTML5 specification or a bug in our
implementation of the spec?  If its the former, you might want to file
a bug with the HTML working group to resolve the issue.

Adam


  haven't picked up yet).  The different namespaces can only be nested
  in certain ways, unlike in XML where arbitrary nesting is possible.

 ...

 p ...
 math
 mfenced open='[ close=]
 div ... random stuff /div
 /mfenced
 /math
 /p

 It would then pop the open stack back to the parent p element
 and the div element would be a child of the paragraph and not
 of the fencing.

 Personally I agree with you that this desire to make html elements forcibly
 close the surrounding math elements is entirely bogus, and it causes all sorts
 of problems in annotation-xml (where you really want nested html) but we 
 failed
 to convince the html WG (or the html editor) of that and so ended up with a
 special case workaround for annotation-xml

 http://www.w3.org/Bugs/Public/show_bug.cgi?id=9887#c16

 sometimes you have to take what you can get:-)

 However I don't agree that using the token elements as extension points is 
 only
 necessary because of html parser strangeness, I think it leads to a cleaner
 design, and better fallback behaviour for systems that do not understand the
 foreign elements, in any case.


 In XHTML, assuming there are appropriate uses of
 namespaces, everything would work fine and you'd get a div
 element fenced with stretching square brackets.

 It would probably render OK but wouldn't be valid according to the published
 schemas. As with most polyglot requirements assuming xml and html validity
 goes a log way to ensuring that you get the same dom.

 So, if you cut-n-pasted the same content with the 'xmlns'
 attributes, you'd get two very different results.

 That really feels fixable but I'm going to need to think a bit
 more about what adjustments there would need to be
 to the rules.

 I wonder what the intersection of local names is between
 MathML and HTML ...

 By design there is no intersection, although it turns out that browsers
 implemented (and html5 acknowledges) image as a synonym for img which is
 therefore the one clash with a mathml name.


 This is, of course, an HTML5 issue and not really an WebKit
 issue except for the question of difficulty of implementation.


 yep.

 David




 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-02 Thread François Sausset
It seems to be the later.
This is indeed a regression, but I don't know how to detect when it appeared. 
From my memory, it was OK a few months ago.

François


Le 2 nov. 2010 à 17:26, Adam Barth a écrit :

 On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com wrote:
 Alex Milowski alex at milowski.org writes:
 
 sorry for late reply, I'm not subscribed, just saw this in the archives.
 
 
 On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote:
 Our parser follows the spec (modulo late-breaking spec changes that we
 
 Actually most mathml in the wild will be mis-parsed by the webkit html5 
 parser
 because of
 
 https://bugs.webkit.org/show_bug.cgi?id=48105
 
 but that's hopefully a temporary glitch.
 
 Is this a bug in the HTML5 specification or a bug in our
 implementation of the spec?  If its the former, you might want to file
 a bug with the HTML working group to resolve the issue.
 
 Adam

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-02 Thread Adam Barth
On Tue, Nov 2, 2010 at 9:52 AM, David Carlisle d.p.carli...@gmail.com wrote:
 On 2 November 2010 16:26, Adam Barth aba...@webkit.org wrote:
 On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com 
 wrote:
 Alex Milowski alex at milowski.org writes:

 sorry for late reply, I'm not subscribed, just saw this in the archives.


 On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org wrote:
  Our parser follows the spec (modulo late-breaking spec changes that we

 Actually most mathml in the wild will be mis-parsed by the webkit html5 
 parser
 because of

 https://bugs.webkit.org/show_bug.cgi?id=48105

 but that's hopefully a temporary glitch.

 Is this a bug in the HTML5 specification or a bug in our
 implementation of the spec?  If its the former, you might want to file
 a bug with the HTML working group to resolve the issue.

 Adam

 I'm pretty sure that it is an implementation issue (firefox 4 doesn't
 have this problem for example). Certainly I can't see anything that
 would specify parsing something as simple as


 math
 mrow
 mrowmn1/mn/mrow
 mia/mi
 /mrow
 /math

 as a completely different tree:

 math mrow mrowmn1/mn/mrow/mrow mia/mi /math

 It makes mathml and svg pretty unusable of course as it's common (very
 common in mathml case) to have elements nested within an element of
 the same name.

Okiedokes.  I've CCed Eric and myself on the bug since we're the
mostly likely folks to fix the issue.  We'd certainly welcome a patch
from you, if you're interested in fixing the issue.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-11-02 Thread James Simonsen
On Tue, Nov 2, 2010 at 10:17 AM, Adam Barth aba...@webkit.org wrote:

 On Tue, Nov 2, 2010 at 9:52 AM, David Carlisle d.p.carli...@gmail.com
 wrote:
  On 2 November 2010 16:26, Adam Barth aba...@webkit.org wrote:
  On Tue, Nov 2, 2010 at 7:55 AM, David Carlisle d.p.carli...@gmail.com
 wrote:
  Alex Milowski alex at milowski.org writes:
 
  sorry for late reply, I'm not subscribed, just saw this in the
 archives.
 
 
  On Fri, Oct 1, 2010 at 12:52 PM, Adam Barth abarth at webkit.org
 wrote:
   Our parser follows the spec (modulo late-breaking spec changes that
 we
 
  Actually most mathml in the wild will be mis-parsed by the webkit html5
 parser
  because of
 
  https://bugs.webkit.org/show_bug.cgi?id=48105
 
  but that's hopefully a temporary glitch.
 
  Is this a bug in the HTML5 specification or a bug in our
  implementation of the spec?  If its the former, you might want to file
  a bug with the HTML working group to resolve the issue.
 
  Adam
 
  I'm pretty sure that it is an implementation issue (firefox 4 doesn't
  have this problem for example). Certainly I can't see anything that
  would specify parsing something as simple as
 
 
  math
  mrow
  mrowmn1/mn/mrow
  mia/mi
  /mrow
  /math
 
  as a completely different tree:
 
  math mrow mrowmn1/mn/mrow/mrow mia/mi /math
 
  It makes mathml and svg pretty unusable of course as it's common (very
  common in mathml case) to have elements nested within an element of
  the same name.

 Okiedokes.  I've CCed Eric and myself on the bug since we're the
 mostly likely folks to fix the issue.  We'd certainly welcome a patch
 from you, if you're interested in fixing the issue.


This is my bug from the new handling of foreign content mode. I'll upload a
patch shortly.

James
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-10-04 Thread Alex Milowski
On Sat, Oct 2, 2010 at 3:48 PM, David Carlisle d.p.carli...@gmail.com wrote:
 Alex Milowski alex at milowski.org writes:

 From reading the section on in foreign content' [1], it would seem that it
 should assign the 'svg' elements to the MathML namespace when they
 are embedded as above.  That means cutting and pasting the same
 content fragment gives two very different interpretations--which is more
 of a problem for the HTML5 spec than webkit.


 As (since?) confirmed elsewhere on another list, but mentioned here for the
 record, the example becomes valid (and parse-able by html5 parser) if you wrap
 the svg in mi elements.
 the presentation mathml token elements, mi, mtext, etc are specified as being
 the extension points where you can embed html (and thus svg).

That presents a challenge to the way the MathML implementation
is current organized.  In the current implementation, token elements
are not suppose to contain element content.  We'll need to
completely re-architect the token elements to handle this in
all situations as we won't get it by default in several cases.  For
example, if the SVG is embedded in an 'mo' element, the
SVG will be ignored.

That also questions what should be done in cases like:

mo random text svg ... /svg /mo

I still stand by my position that wrapping foreign elements
in token elements in MathML is completely unnecessary
for SVG, HTML, or other vocabularies that have rendering semantics
that translate into some sequence of inline or block boxes.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 Parsing amp; MathML

2010-10-02 Thread David Carlisle
Alex Milowski alex at milowski.org writes:

 From reading the section on in foreign content' [1], it would seem that it
 should assign the 'svg' elements to the MathML namespace when they
 are embedded as above.  That means cutting and pasting the same
 content fragment gives two very different interpretations--which is more
 of a problem for the HTML5 spec than webkit.
 

As (since?) confirmed elsewhere on another list, but mentioned here for the
record, the example becomes valid (and parse-able by html5 parser) if you wrap
the svg in mi elements.
the presentation mathml token elements, mi, mtext, etc are specified as being
the extension points where you can embed html (and thus svg).

David


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev