Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-16 Thread Henri Sivonen

On Apr 16, 2008, at 10:47, Paul Libbrecht wrote:
I would like to put a grain of salt here and would love HTML5  
passionates to answer:


why is the whole HTML5 effort not a movement towards a really  
enhanced parser instead of trying to redefine fully HTML successors?


text/html has immense network effects both from the deployed base of  
text/html content and the deployed base of software that deals with  
text/html. Failing to plug into this existing network would be  
extremely bad strategy. In fact, the reason why the proportion of Web  
pages that get parsed as XML is negligible is that the XML approach  
totally failed to plug into the existing text/html network effects  
(except for Appendix C which lacks a migration strategy to actual XML  
and amounts to the emperor's new clothes).


Being an enhanced parser (that would use a lot of context info to be  
really hand-author supportive) it would define how to parse better  
an XHTML 3 page, but also MathML and SVG as it does currently... It  
has the ability to specify very readable encodings of these pages.


It could serve as a model for many other situations where XML  
parsing is useful but its  strictness bytes some.


Anne has been working on XML5, but being able to parse any well-formed  
stream to the same infoset as an XML 1.0 parser and being able to  
parse existing text/html content in a backwards-compatible way are  
mutually conflicting requirements. Hence, XML5 parsing won't be  
suitable for text/html.


Currently HTML5 defines at the same time parsing and the model and  
this is what can cause us to expect that XML is getting weaker. I  
believe that the whole model-definition work of XML is rich, has  
many libraries, has empowered a lot of great developments and it is  
a bad idea to drop it instead of enriching it.


The dominant design of non-browser HTML5 parsing libraries is exposing  
the document tree using an XML parser API. The non-browser HTML5  
libraries, therefore, plug into the network of XML libraries. For  
example, Validator.nu's internals operate on SAX events that look like  
SAX events for an XHTML5 document. This allows Validator.nu to use  
libraries written for XML, such as oNVDL and Saxon.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-16 Thread Henri Sivonen

On Apr 16, 2008, at 12:58, Paul Libbrecht wrote:
In fact, the reason why the proportion of Web pages that get parsed  
as XML is negligible is that the XML approach totally failed to  
plug into the existing text/html network effects[...]


My hypothesis here is that this problem is mostly a parsing problem  
and not a model problem. HTML5 mixes the two.


For backwards compatibility in scripted browser environments, the HTML  
DOM can't behave exactly like the XHTML5 DOM. For non-scripted non- 
browser environments, using an XML data model (XML DOM, XOM, JDOM,  
dom4j, SAX, ElementTree, lxml, etc., etc.) works fine.


There are tools that convert quite a lot of text/html pages (whose  
compliance is user-defined to be it works in my browser) to an XML  
stream today NeckoHTML is one of them. The goal would be to  
formalize this parsing, and just this parsing.


Like NekoHTML and TagSoup, the Validator.nu HTML parser turns text/ 
html input into Java XML models. The difference is that the  
Validator.nu HTML parser implements the HTML5 algorithm instead of  
something the authors of NekoHTML and TagSoup figured out on their  
own. So if you are asking for a NekoHTML-like product for HTML5, it  
already exists and supports three popular Java XML APIs (SAX, DOM and  
XOM). Not XNI, though, at the moment. (It doesn't support the recent  
MathML addition, *yet*, though.)


http://about.validator.nu/htmlparser/

Currently HTML5 defines at the same time parsing and the model and  
this is what can cause us to expect that XML is getting weaker. I  
believe that the whole model-definition work of XML is rich, has  
many libraries, has empowered a lot of great developments and it  
is a bad idea to drop it instead of enriching it.


The dominant design of non-browser HTML5 parsing libraries is  
exposing the document tree using an XML parser API. The non-browser  
HTML5 libraries, therefore, plug into the network of XML libraries.  
For example, Validator.nu's internals operate on SAX events that  
look like SAX events for an XHTML5 document. This allows  
Validator.nu to use libraries written for XML, such as oNVDL and  
Saxon.


So, except for needing yet another XHTML version to accomodate all  
wishes, I think  it would be much saner that browsers'  
implementations and related specifications rely on an XML-based  
model of HTML (as the DOM is) instead of a coupled parsing-and- 
modelling specification which has different interpretations at  
different places.


HTML5 already specifies parsing in terms of DOM output. However, when  
the DOM is in the HTML mode, it has to be slightly different.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-16 Thread Anne van Kesteren
On Wed, 16 Apr 2008 18:36:49 +0200, William F Hammond  
[EMAIL PROTECTED] wrote:

About 7 years ago there was argument in these circles about whether
correct xhtml+mathml could be served as text/html.

As we all know, a clear boundary was drawn, presumably because it
was onerous for browsers to sniff incoming content and then decide
how to parse.


Actually, it was not the browsers:

  http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html



As things have evolved, we now know that browsers do, in fact, perform
a lot of triage.  See, for example, Mozilla's DOCTYPE sniffing,
http://developer.mozilla.org/en/docs/Mozilla's_DOCTYPE_sniffing


That's a very limited set of differences which mostly affect page layout.



Especially since we are speaking about dual serialization of the same
DOM and since there is relatively little use of
application/xhtml+xml (and some significant user agents do not
support it), might it not be worthwhile to re-examine the question of
serving standards-compliant xhtml or xhtml+(mathml|svg) serialized
document instances as either text/html or application/xhtml+xml?

In other words, why not be able to serve both serializations
as text/html?

What obstacles to this exist?


The Web.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-16 Thread Maciej Stachowiak


On Apr 16, 2008, at 9:36 AM, William F Hammond wrote:




About 7 years ago there was argument in these circles about whether
correct xhtml+mathml could be served as text/html.

As we all know, a clear boundary was drawn, presumably because it
was onerous for browsers to sniff incoming content and then decide
how to parse.

As things have evolved, we now know that browsers do, in fact, perform
a lot of triage.  See, for example, Mozilla's DOCTYPE sniffing,
http://developer.mozilla.org/en/docs/Mozilla's_DOCTYPE_sniffing

Especially since we are speaking about dual serialization of the same
DOM and since there is relatively little use of
application/xhtml+xml (and some significant user agents do not
support it), might it not be worthwhile to re-examine the question of
serving standards-compliant xhtml or xhtml+(mathml|svg) serialized
document instances as either text/html or application/xhtml+xml?

In other words, why not be able to serve both serializations
as text/html?

What obstacles to this exist?


It's not entirely clear what your proposal is, but I assume you are  
suggesting that content served as text/html with an XHTML doctype  
declaration should be parsed as XML. The obstacle to this is that much  
text/html content has an XHTML doctype declaration but depends on  
being parsed and otherwise processed as HTML, not XML, as current user  
agents do it. Such content is fairly widespread due to the legacy of  
Appendix C. It is preferable to let the MIME type continue to be the  
switch rather than making the doctype serve this role.


An additional obstacle in the case of HTML5 is that the XML  
serialization does not have a distinct doctype (they may use the  
common HTML5 doctype or no doctype at all, which when parsed as text/ 
html would be treated as an HTML document in quirks mode).


Regards,
Maciej



Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-16 Thread Krzysztof Żelechowski

Dnia 10-04-2008, Cz o godzinie 09:51 +, Ian Hickson pisze:

 On Sat, 4 Nov 2006, Paul Topping wrote:
 
  Elements whose namespaces aren't known should be handled like any other 
  unknown HTML element. I believe the common way for user agents to handle 
  an unknown element is basically to ignore the tag and its attributes and 
  treat any text between start and end tags as if the tags weren't there. 
  Namespaces do not present any new challenge in this area. Bogus 
  namespaces are no more of a security risk than bogus HTML tags. It is 
  only the ones that ARE processed by the user agent that represent 
  potential security risks.
 
 The problem is legacy content like:
 
html
 foo xmlns=bogus namespace
  ...rest of HTML document...
 
 We don't want to make the whole document get ignored.

An example of such a tag is Microsoft HTML application indicator 
which is empty by design.
But how does Paul’s recipe amount to ignoring the whole document?


 If anyone is actually reading this 3363 line e-mail, I'm
 impressed. Please do let me know that you read this.

I do not do bungee jumping though.  





Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-16 Thread Anne van Kesteren
On Wed, 16 Apr 2008 22:01:49 +0200, William F Hammond  
[EMAIL PROTECTED] wrote:

Anne van Kesteren [EMAIL PROTECTED] writes:

The Web.


Really!?!


Yes, see for instance:

  http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html



It's time for user agents to stop supporting bogus document preambles.


Please keep the discussion realistic.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-15 Thread Ian Hickson
On Tue, 15 Apr 2008, Chris Chiasson wrote:

 So, have the HTML 5 people already made up their minds, where the 
 discussion that continues today has no chance of maintaining the XML 
 serialization?

Nothing is yet set in stone for HTML5 [1].

The XHTML variant of HTML isn't going away, though. The HTML5 spec defines 
both a text/html serialisation and an XHTML serialisation, as well as the 
processing requirements and conformance requirements for both.

Does that answer your question?


([1] Well, sort-of nothing. As more and more of HTML5 gets implemented, 
more and more of our constraint to not break backwards compatibility 
marches forward to include stuff that several years ago counted as new in 
HTML5.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-11 Thread Øistein E . Andersen
On Thursday 10th April 2008, Ian Hickson wrote:

 SVG radicals aren't typographically acceptable either.
 You really want to use fonts for this.

Current browsers are clearly better at rendering TrueType
and PostScript fonts at small sizes than equivalent shapes
expressed as SVG paths.  (This may or may not or may only
in part be related to hinting as I never tested this using
hinted and unhinted versions of the same font, but I
suspect that hinting does not account for everything.)
Poor or even abysmal on-screen rendering made me abandon this
approach last time I considered maths-to-SVG conversion.

This particular problem would however be less of an issue with
bigger and/or more geometrical shapes, and I would consider
TeX's construction of, e.g., vincula (horizontal lines) by
overstriking of tiny bits from a font to be an artifact of
not being able to intermix text and graphics freely rather
than to result from intrinsic aesthetic superiority of the
`everything-from-fonts' approach.

Now that Safari for Mac supports custom fonts using
@font-face and other browsers will follow suit, using
fonts for text and operators and SVG graphics for big
delimiters and geometric symbols would seem to be a
reasonable approach, and I would be interested to know
what might make SVG radicals `typographically unacceptable'.
(Obviously, fonts and SVG elements must be coordinated.)

http://coq.no/musica/it illustrates the concept for
musical notation (SVG lines to draw the staves combined with
a font for the clefs and accidentals), and I think SVG
would also be appropriate for ties in musical
notation, bonds in chemical 2D formulae, c. to 
achieve high-quality, typographically sound rendering.
Am I na�vely overlooking an inherent problem with SVG?

-- 
�istein E. Andersen