Re: [whatwg] text/html for html and xhtml (Was: Supporting MathML and SVG in text/html, and related topics)

2008-04-17 Thread liorean
On 17/04/2008, William F Hammond [EMAIL PROTECTED] wrote:
  Previously:

   Yes, but the point is, once a user agent begins to sniff, there's no
   rational excuse for it not to recognize compliant xhtml+(mathml|svg).

Yes there is. Live content rely on even perfectly well formed XHTML to
have the HTML behaviours of CSS and the DOM. It also relies on all
elements having #PCDATA content. Thus scripts and style sheets would
be given an incompatible parsing that changes the meaning of '', ''
and XML comments within scripts, just to take one example. That is, a
script which is well formed and valid XML and which is XML well
formedness-compatible and proper HTML may have entirely textual
content. (The subset of live XHTML content that uses embedded scripts
which are also XML well formed without using explicit CDATA wrapping
is very small, though.)

What obstacles to this exist?
   
The Web.

   Really!?!

Really.

  And then:

   The Web.
  
   Really!?!
  
   Yes, see for instance:
  
  http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html

  Taylor's comment is mainly about what happens when a user agent
  confuses tag soup with good xhtml.

  It is a different question how a user agent decides what it is looking
  at.

  Whether there is one mimetype or two, erroneous content will need
  handling.  The experiment begun around 2001 of punishing bad
  documents in application/xhtml+xml seems to have led to that mime type
  not being much used.

We don't know how big a factor the draconianness of XML parsing really
is. The fact is, the single biggest consumer of those documents has
not begun supporting XHTML yet. Internet Explorer supports HTML and
XML but not the XHTML namespace in XML, nor the XHTML content type.
This alone makes everybody reluctant to serve application/xhtml+xml.
Sure, there are other complications from the XML draconianness than
this, but my point is that these are all compounded, so it's hard to
tell how effectively they have been put to the test. If you could run
the test again with Internet Explorer's non-support taken out of the
equation, then you would be able to say something about it. As it is
currently, you can't know either way.

  So user agents need to learn how to recognize the good and the bad
  in both mimetypes.

  Otherwise you have Gresham's Law: the bad documents will drive out the
  good.

  The logical way to go might be this:

  If it has a preamble beginning with ^?xml  or a sensible
  xhtml DOCTYPE declaration or a first element html xmlns=...,
  then handle it as xhtml unless and until it proves to be non-compliant
  xhtml (e.g, not well-formed xml, unquoted attributes, munged handling
  of xml namespaces, ...).  At the point it proves to be bad xhtml reload
  it and treat it as regular html.

Doesn't work. We need DOM and CSS treatment as in HTML, not as in
XHTML, to be compatible with live content for those circumstances too.

  So most bogus xhtml will then be 1 or 2 seconds slower than good xhtml.
  Astute content providers will notice that and then do something about it.
  It provides a feedback mechanism for making the web become better.

So, you argue that a document with an XHTML structure as text/html
should change semantics in ways that will affect functionality,
behaviour and presentation because of e.g. a single unescaped
ampersand in a URI or a single character that breaks because of
encoding?




My opinion:
Any feedback mechanism that directly hurts the user and only
indirectly hurts the publisher, as opposed to a feedback mechanism
that directly notifies the publisher, is totally backwards. Fail
early. Compile time is better than run time because that's instantly
obvious to the programmer - the build isn't compiling, so there
there's no working but buggy build to give users. The analogy for web
content is that you should fail at publishing time instead of viewing
time if possible, because then you HAVE to correct your documents
before you can serve them to the user.

If you want to serve XML to users on the web, you should make sure
your tools cannot possibly serve malformed XML, by making absolutely
certain that the content has correct encoding (any defaulting must
confirm that the content actually conforms to the default encoding),
has a specified content type (defaulting is acceptable for fragments
here, but e.g. uploading raw files should require specifying the type)
and is a well formed fragment or document at publishing time, loudly
rejecting any content that is malformed.   (And by publishing I
include all sources: design templates, content producers, information
from the database, advertisements, comments, trackbacks etc.)
-- 
David liorean Andersson


Re: [whatwg] text/html for html and xhtml (Was: Supporting MathML and SVG in text/html, and related topics)

2008-04-16 Thread Boris Zbarsky

William F Hammond wrote:

The experiment begun around 2001 of punishing bad
documents in application/xhtml+xml seems to have led to that mime type
not being much used.


That has more to do with the fact that it wasn't supported in browsers 
used by 90+% of users for a number of years.



So user agents need to learn how to recognize the good and the bad
in both mimetypes.


Recognize and do what with it?


Otherwise you have Gresham's Law: the bad documents will drive out the
good.


Perhaps you should clearly state your definitions of bad and good in 
this case?  I'd also like to know, given those definitions, why it's bad 
for the bad documents to drive out the good, and how you think your 
proposal will prevent that from happening.



If it has a preamble beginning with ^?xml  or a sensible
xhtml DOCTYPE declaration or a first element html xmlns=...,
then handle it as xhtml unless and until it proves to be non-compliant
xhtml (e.g, not well-formed xml, unquoted attributes, munged handling
of xml namespaces, ...).  At the point it proves to be bad xhtml reload
it and treat it as regular html.


What's the benefit?  This seems to give the worst of both worlds, as 
well as a poor user experience.



So most bogus xhtml will then be 1 or 2 seconds slower than good xhtml.
Astute content providers will notice that and then do something about it.
It provides a feedback mechanism for making the web become better.


In the meantime, it punishes the users for things outside their control 
by degrading their user experience.  It also provides a competitive 
advantage to UAs who ignore your proposal.


Sounds like an unstable equilibrium to me, even if attainable.

-Boris