Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Benjamin Hawkes-Lewis
On Tue, 2006-11-28 at 16:20 -0500, Sam Ruby wrote: 

 I believe that I could modify my weblog to be simultaneously both
 HTML5 and XHTML5 compliant, modulo the embedded SVG content, something
 that would needs to be discussed separately.

I think having /two/ different serializations of Web Forms 2.0/Web
Applications 1.0 is bad enough. To try and cater to what's effectively a
third serialization compatible with both parsing methods is to reinvent
the XHTML 1.0 as text/html mess. Serializing to multiple formats from
a single source is, I think, a better model. Especially as embedded
content may need different treatment too.

 Lachlan's observations [...] on what it would take to 
 change the popular WordPress application to produce HTML5 compliant
 output

As blogging software goes, WordPress is pretty good. But then blogging
software is generally atrocious when it comes to markup. Trying to
design an (X)HTML spec for a group of PHP developers who think it's
persuasive to bang on about their dedication to web standards while
serving their project's non-validating XHTML 1.1 homepage as text/html
is doomed to failure.

Adapting WordPress to be an efficient creator of good HTML (4 or 5)
would be a big job, probably entailing a total rethink at some levels.
But hacking WordPress to output valid HTML 4.01 is by no means
impossible. A method for removing the trailing slashes you're worrying
about was posted to the WordPress support forum:

http://wordpress.org/support/topic/76431

--
Benjamin Hawkes-Lewis



[whatwg] SVG and significant inline

2006-11-29 Thread Henri Sivonen
Since the img element in the XHTML namespace counts as significant  
inline content, the svg element in the SVG namespace should probably,  
too.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Benjamin Hawkes-Lewis wrote:
On Tue, 2006-11-28 at 16:20 -0500, Sam Ruby wrote: 


I believe that I could modify my weblog to be simultaneously both
HTML5 and XHTML5 compliant, modulo the embedded SVG content, something
that would needs to be discussed separately.


I think having /two/ different serializations of Web Forms 2.0/Web
Applications 1.0 is bad enough. To try and cater to what's effectively a
third serialization compatible with both parsing methods is to reinvent
the XHTML 1.0 as text/html mess. Serializing to multiple formats from
a single source is, I think, a better model. Especially as embedded
content may need different treatment too.


That was not the intent of my suggestion.  I am suggesting that HTML5 
standardize on *one* format.  One that comes as close as humanly 
possible to capturing the web as it is practiced in all of its glorious 
and often quite messy detail.  Those that wish to serialize the DOM in 
other formats are certainly free to do so, but those formats aren't HTML5.


I do have an opinion on how embedded content should be handled, but I am 
trying to focus on one issue at a time.  If you would like a preview, 
take a peek at:


http://planet.intertwingly.net/
http://planet.intertwingly.net/top100/
http://golem.ph.utexas.edu/~distler/planet/

Those three planets take input from a number of frankly grungy input 
sources and consistently produce well formed XML that often contain 
embedded MathML or SVG content.


You are, of course, free to explore those pages and others; but, for 
now, I would like to focus on one question:


If HTML5 were changed so that these elements -- and these elements
alone -- permitted an optional trailing slash character, what
percentage of the web would be parsed differently?  Can you cite
three independent examples of existing websites where the parsing
would diverge?

Lachlan's observations [...] on what it would take to 
change the popular WordPress application to produce HTML5 compliant

output


As blogging software goes, WordPress is pretty good. But then blogging
software is generally atrocious when it comes to markup. Trying to
design an (X)HTML spec for a group of PHP developers who think it's
persuasive to bang on about their dedication to web standards while
serving their project's non-validating XHTML 1.1 homepage as text/html
is doomed to failure.


I'm pretty sure that the Mozilla home page was not created with 
WordPress, and I'm absolutely sure that the Microsoft home page was not.


Conversely, if the major browser vendors have to chose between the web 
as it is commonly practiced, and a spec that doesn't reflect that 
reality, which one do you think they will chose?


I'll argue that the choices aren't as black and white as either the 
question you posed above, or even the one that I did.


No matter what the WHATWG spec says, each vendor will independently make 
a cost/benefit analysis as to how they should treat trailing slashes in 
elements like img.


But before they do, this work group certainly can anticipate that 
question.  What is the cost of accepting trailing slashes on elements 
which are always defined with a content model of empty, except when 
found in Attribute value (unquoted) state?  What sites would be parsed 
differently based on this change?  Are those differences in line with 
how existing browsers actually behave, or at odds with this behavior?


- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Stewart Brodie
Robert Sayre [EMAIL PROTECTED] wrote:

 On 11/29/06, Lachlan Hunt [EMAIL PROTECTED] wrote:
 
  I do not think it's a good idea to make the trailing slash conforming.
  Although it is harmless, it provides no additional benefit at all and it
  creates the false impression that the syntax actually does something.
 
 It does do something, in systems that think they are using XML
 (whether they actually are is another matter). It's possible it will
 prevent  many information-free validation errors, and give the HTML5
 more credibility as a result. Warning people about img / in the
 validator is a waste of their time.
 
  It's not a
  good idea to confuse them any more by giving the impression that it
  works for some elements but not others.  It's better to just say it
  doesn't work at all and forbid it in all cases.
 
 
 Better? This is an opinion, and it's not backed up by data. So far, it
 looks like Sam has the data on his side. People do it, and it tends to
 work interoperably.

Except when it doesn't.

For example, here's a fragment of hotmail.com's signup page, served as
text/html.  It's the only example I've come across to date:


!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0
  Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
html xmlns=http://www.w3.org/1999/xhtml; dir=ltr
...
select id=iRegion name=pff010004 /
  script.../script
/select
...


The script just document.write's loads of option tags (it's the country
menu).  It's hard to know what the author thought was going on.  Did they
think it was XHTML and just got stymied by the server configuration?

I'm still in favour of permitting the trailing slash, personally.


-- 
Stewart Brodie
Software Engineer
ANT Software Limited


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Henri Sivonen

On Nov 28, 2006, at 23:20, Sam Ruby wrote:

In HTML5, there are a number of elements with a content model of  
empty: area, base, br, col, command, embed, hr, img, link, meta,  
and param.


If HTML5 were changed so that these elements -- and these elements  
alone -- permitted an optional trailing slash character, what  
percentage of the web would be parsed differently?


Obviously, 0% with parsers that opt to implement the HTML5 parsing  
algorithm with error recovery as opposed to Draconian error handling-- 
except for the detail whether error-reporting parsers report an error  
or not. (In theory, this is an issue for non-browser UAs that opt to  
implement Draconian error handling. In practice, even my mostly  
Draconian parser treats this particular error as non-fatal, because  
it is so common and so easily recoverable.)


The basis for my question is the observation that the web browsers  
that I am familiar with apparently already operate in this fashion,  
this usage seems to have crept into quite a number of diverse  
places, and all this is coupled with Lachlan's observations[3] on  
what it would take to change the popular WordPress application to  
produce HTML5 compliant output.


WordPress is a soup-in-soup-out system that shouldn't be trying to  
produce the XML syntax in the first place. But now that WP is using  
it, the question becomes: which is more costly: asking the WP  
developers to change their system or to adjust the definition of  
conformance so that WP looks conforming more easily.


Anyway, as Lachlan already pointed out, whether or not the useless  
slash should be allowed on elements whose content model is empty is  
not an issue of technical damage to parsing interoperability but  
about damage to the mental model of confused authors. So the cost to  
consider is the cost of the confusion.


As a side benefit of this change, I believe that I could modify my  
weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo  
the embedded SVG content, something that would needs to be  
discussed separately.


I am against blurring the distinction between the XML serialization  
and the HTML serialization. The infamous Appendix C didn't bring  
about good things.


Having a text/html serialization that is also parseable as XML  
doesn't work from the UA point of view, because reality requires UAs  
to parse text/html using an HTML parser. Now, since UAs can't use an  
XML parser for parsing text/html anyway, it becomes useless for  
content providers to ensure that their text/html content is XML- 
parseable.


Restricting the XML syntactic sugar, such as the use of CDATA  
sections or foo/ vs. foo/foo on the application/xhtml+xml side  
would be wrong in principle, because it is wrong for a higher-layer  
spec to micromanage lower-layer syntactic sugar or, worse, give  
differences in syntactic sugar a difference in meaning. In practice,  
limiting XML details of the application/xhtml+xml serialization would  
be useless, because it is processed using XML processors which are  
required to support full syntactic sugar anyway.


I think that your blog system is a special case. Considering that I  
have seen the Yellow Screen of Death on your blog, it appears that  
you aren't using an isolated serializer that could be swapped.  
However, the reason why your site works is that it is built vastly  
more competently than other systems that don't use an isolated  
serializer *and* because you are both the developer and the deployer  
and you care about these issues, you can and do fix bugs quickly.  
That just doesn't work with systems that aren't constantly managed by  
the developer.


So no offense intended, but I think that what would work for you (or  
Jacques Distler) isn't generalizable. Rather, a warning to the effect  
of professional driver on closed road would be appropriate. :-)


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Lachlan Hunt wrote:

Sam Ruby wrote:
In HTML5, there are a number of elements with a content model of 
empty: area, base, br, col, command, embed, hr, img, link, meta, and 
param.


If HTML5 were changed so that these elements -- and these elements 
alone -- permitted an optional trailing slash character, what 
percentage of the web would be parsed differently?  Can you cite three 
independent examples of existing websites where the parsing would 
diverge?


If it's only allowed on empty elements (now known as singleton 
elements in the spec) then this isn't about changing the handling, it's 
just about defining what is and is not conforming.


Exactly.

I do not think it's a good idea to make the trailing slash conforming. 
Although it is harmless, it provides no additional benefit at all and it 
creates the false impression that the syntax actually does something.


The fact is that authors already try things like div/, p/ and even 
a/.  I've seen all of those examples in the wild.  See, for instance, 
the source of the XML 1.0 spec (and many others) which claim to be XHTML 
as text/html, littered with plenty of a/ tags all throughout.


If these are common, and implemented interoperably, then what is the 
harm?  An example of something that is NOT implemented interoperably is 
script src=.../.


In my book, a document that states that it always is a parse error to do 
something despite abundant evidence to the contrary is not as useful as 
one that says here are the places where it works, and here are the 
places where it does not.


I've even come across various authors either thinking that does work, or 
(when they find out the truth) wondering why it doesn't.  It's not a 
good idea to confuse them any more by giving the impression that it 
works for some elements but not others.  It's better to just say it 
doesn't work at all and forbid it in all cases.


That's a slippery slope.  At the extreme, it leads to XHTML 2.0, where 
features that are thought to be problematic are removed.  Think of the 
children.


By contrast, in HTML5, I see a document that attempts to be considerably 
less judgemental, and considerably more resilient.  Inside the comments 
in the HTML 5 document I see statistics lovingly cited.  Example:


!-- As of
2005-12, studies showed that around 0.2% of pages used the
image element. --

What percentage of pages use img/ constructs?

and all this is coupled with Lachlan's observations[3] on what it 
would take to change the popular WordPress application to produce 
HTML5 compliant output.


That just illustrates a fundamental flaw in the way WordPress has been 
built.  It is a perfect example of a CMS built by a bunch of bozos [1] 
and cannot be used as an excuse for allowing the syntax.


Be careful when you patronize.

Is there really any excuse for allowing biOMG!/b/i?  No, but 
HTML5 is willing to pinch its nose with thumb and forefinger and look 
the other way.  It literally is not a battle worth fighting.


As a side benefit of this change, I believe that I could modify my 
weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo 
the embedded SVG content, something that would needs to be discussed 
separately.


No you couldn't, and how would that be a benefit if you could?  XHTML 5 
requires xmlns, HTML 5 forbids it.  HTML 5 requires !DOCTYPE html, 
XHTML 5 doesn't (though it's still well-formed, so you could get away 
with it).


The last I saw, HTML 5 is a working draft.  Did I miss a memo?

With Venus, I translate all content into a canonical well formed XML 
format.  This enables people who author filters to the ability to worry 
about a lot less random edge cases.  I've already seen a lot of 
inventiveness when people find that they can apply off the shelf XML 
tools like XPath and XSLT.


I'd gladly put in a !DOCTYPE html in my page, the question is: would 
the WHATWG be willing to meet me half way and allow xmlns attributes in 
a very select and carefully prescribed set of locations?


By the way, my experience is that these types of conversations always 
start off bumpy not merely due to the well known limitation of email for 
conveying human emotion.  The problem is deeper than that: there 
literally is no good place to start.  The only way I know how to deal 
with that is to pose, and repeat, concrete and simple questions.  And 
the one that I am posing with this thread is as follows:


If HTML5 were changed so that these elements -- and these elements
alone -- permitted an optional trailing slash character, what
percentage of the web would be parsed differently?  Can you cite
three independent examples of existing websites where the parsing
would diverge?


[1] http://hsivonen.iki.fi/producing-xml/


- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Steve Runyon

To me, '/' or '/' mean the tag's done.  Therefore, 'select
/.../select' (or anything similar) is just plain wrong -- that would be a
select list with nothing in it, then some options that are hanging out
somewhere on their own, then an unmatched closing select.  This shouldn't
validate, serializers shouldn't allow it, and deserializers should simply
ignore the options and '/select' (or maybe dump the options' text to the
output and just ignore the '/select').

Now this, 'img src=... /' -- which is what I thought this discussion was
about initially -- is perfectly valid; it's nothing more than a tag without
content.


On 11/29/06, Stewart Brodie [EMAIL PROTECTED] wrote:


Robert Sayre [EMAIL PROTECTED] wrote:

 On 11/29/06, Lachlan Hunt [EMAIL PROTECTED] wrote:
 
  I do not think it's a good idea to make the trailing slash conforming.
  Although it is harmless, it provides no additional benefit at all and
it
  creates the false impression that the syntax actually does something.

 It does do something, in systems that think they are using XML
 (whether they actually are is another matter). It's possible it will
 prevent  many information-free validation errors, and give the HTML5
 more credibility as a result. Warning people about img / in the
 validator is a waste of their time.

  It's not a
  good idea to confuse them any more by giving the impression that it
  works for some elements but not others.  It's better to just say it
  doesn't work at all and forbid it in all cases.
 

 Better? This is an opinion, and it's not backed up by data. So far, it
 looks like Sam has the data on his side. People do it, and it tends to
 work interoperably.

Except when it doesn't.

For example, here's a fragment of hotmail.com's signup page, served as
text/html.  It's the only example I've come across to date:


!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0
Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
html xmlns=http://www.w3.org/1999/xhtml; dir=ltr
...
select id=iRegion name=pff010004 /
script.../script
/select
...


The script just document.write's loads of option tags (it's the country
menu).  It's hard to know what the author thought was going on.  Did they
think it was XHTML and just got stymied by the server configuration?

I'm still in favour of permitting the trailing slash, personally.


--
Stewart Brodie
Software Engineer
ANT Software Limited



Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Anne van Kesteren
On Wed, 29 Nov 2006 17:15:53 +0100, Sam Ruby [EMAIL PROTECTED]  
wrote:
I do not think it's a good idea to make the trailing slash conforming.  
Although it is harmless, it provides no additional benefit at all and  
it creates the false impression that the syntax actually does something.
 The fact is that authors already try things like div/, p/ and even  
a/.  I've seen all of those examples in the wild.  See, for instance,  
the source of the XML 1.0 spec (and many others) which claim to be  
XHTML as text/html, littered with plenty of a/ tags all throughout.


If these are common, and implemented interoperably, then what is the  
harm?  An example of something that is NOT implemented interoperably is  
script src=.../.


What do you mean with implemented interoperably? They are all treated as  
if they are just a start tag. (So they are actually treated identically to  
the script src=/ case, except for some versions of Safari and Opera  
and maybe Firefox which do what some people might expect for script  
src= / ...)



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Anne van Kesteren
On Wed, 29 Nov 2006 17:15:53 +0100, Sam Ruby [EMAIL PROTECTED]  
wrote:
Is there really any excuse for allowing biOMG!/b/i?  No, but  
HTML5 is willing to pinch its nose with thumb and forefinger and look  
the other way.  It literally is not a battle worth fighting.


Just like b / that causes a (perhaps several) parse error. Nothing  
special is done about it in HTML5.



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Mihai Sucan
Le Wed, 29 Nov 2006 17:00:46 +0200, Robert Sayre [EMAIL PROTECTED] a  
écrit:



On 11/29/06, Lachlan Hunt [EMAIL PROTECTED] wrote:


I do not think it's a good idea to make the trailing slash conforming.
Although it is harmless, it provides no additional benefit at all and it
creates the false impression that the syntax actually does something.


It does do something, in systems that think they are using XML
(whether they actually are is another matter). It's possible it will
prevent  many information-free validation errors, and give the HTML5
more credibility as a result. Warning people about img / in the
validator is a waste of their time.


It's not a
good idea to confuse them any more by giving the impression that it
works for some elements but not others.  It's better to just say it
doesn't work at all and forbid it in all cases.



Better? This is an opinion, and it's not backed up by data. So far, it
looks like Sam has the data on his side. People do it, and it tends to
work interoperably.



I want to show support to Sam's proposal. I agree with him.

I see HTML 5 as a specification that tries to be tailored to the current  
needs of the web developers, trying to cope with all the bad markup, tag  
soup on the web. It also defines complex algorithms for error recovering,  
everything supposedly leading one day to UAs with HTML 5 implementations,  
that will render all tag soup, and proper markup, the same  
(interoperability - somewhat utopic dream, nonetheless we must not give  
up). Of course, the algorithms described won't work as wanted with all  
tag soup, but the algorithms are trying to be all best-balanced, a  
compromise between the bad, the ugly and the good.


All this leads me to say that Sam's proposal is a good one. One cannot  
expect that WordPress, all content management systems, all web developers,  
etc, will start working with pure HTML 5, or pure XHTML 5, in a single  
project, or even in a single page.


XML parsers break if the code has no trailing slashes where needed, the  
majority of HTML parsers do not break if the author uses trailing slashes.


Some web developers also make use, on the server, of XHTML and XML  
documents, which end up being sent to the UA - parts of, or entirely. Why  
trailing slashes need to break conformance? If trailing slashes are not  
accepted into HTML 5, then many other bad things should be banned. From  
the start, the error recovery should be eliminated, and treated as XML  
parsers do: stop on error, with no recovery.


Web developers want to be able to share code between XHTML and HTML  
projects.


The trailing slash issue should be inexistent. Today many sites use this  
trailing slash in HTML pages. Even if those pages do not validate today, I  
consider they should validate, as long as they validate without the  
trailing slashes.


Take for example PHP which is used by many confused web developers. PHP  
provides the nl2br() function which searches for new lines and adds br  
/. Using that, they automatically invalidate their site. And that's only  
a very simple function.


Very few web developers are not bozos [1]. :)

[1] http://hsivonen.iki.fi/producing-xml/


--
http://www.robodesign.ro
ROBO Design - We bring you the future


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Robert Sayre

On 11/29/06, Anne van Kesteren [EMAIL PROTECTED] wrote:

On Wed, 29 Nov 2006 17:10:10 +0100, Robert Sayre [EMAIL PROTECTED] wrote:
 Perhaps it would be better to prove that the current rules result in
 easy explanations. What would the text of a bug filed on WordPress
 look like? Let's assume you actually want them to fix it, not just
 make a point.

The bug would request that Wordpress doesn't try to output XML for the
text/html media type. That seems to be the problem here.



Ok, so what would the text be? What problem would you tell them you were fixing?

--

Robert Sayre


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Anne van Kesteren wrote:

On Wed, 29 Nov 2006 17:10:10 +0100, Robert Sayre [EMAIL PROTECTED] wrote:

Perhaps it would be better to prove that the current rules result in
easy explanations. What would the text of a bug filed on WordPress
look like? Let's assume you actually want them to fix it, not just
make a point.


The bug would request that Wordpress doesn't try to output XML for the 
text/html media type. That seems to be the problem here.


If the code for Wordpress fit on a page, that suggestion would be easy 
to implement.


As it stands now, it appear that several hundred lines of code would 
need to change.  And in each case, the code would need to be aware of 
the content type in effect.  In some cases, that information may not be 
available.  In fact, that may not have been determined yet.


One way cross-cutting concerns such as this one are often handled is to 
simple capture the output and post-process it.  Latchlan opted to do so 
with the WHATWG Blog.  The first pass for things like this generally 
takes the form of simple pattern matching and regular expressions.


Often this evolves.  What would be better is something that could take 
that string and produce a DOM, from which a correct serialization can 
take place.


Now, what type of parser would you use?  HTML5's rules come 
tantalizingly close to handling this situation, except for a few cases 
involving tags that are self-closing...


- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Anne van Kesteren

On Wed, 29 Nov 2006 17:29:42 +0100, Robert Sayre [EMAIL PROTECTED] wrote:

The bug would request that Wordpress doesn't try to output XML for the
text/html media type. That seems to be the problem here.


Ok, so what would the text be? What problem would you tell them you were  
fixing?


I won't be fixing anything on Wordpress for the foreseeable future.  
Anyway, the bug report would point to http://www.hixie.ch/advocacy/xhtml  
and try to talk them into switching to HTML4 or something so they can  
easier switch to HTML5 later.



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Robert Sayre

On 11/29/06, Anne van Kesteren [EMAIL PROTECTED] wrote:

On Wed, 29 Nov 2006 17:29:42 +0100, Robert Sayre [EMAIL PROTECTED] wrote:
 The bug would request that Wordpress doesn't try to output XML for the
 text/html media type. That seems to be the problem here.

 Ok, so what would the text be? What problem would you tell them you were
 fixing?

I won't be fixing anything on Wordpress for the foreseeable future.
Anyway, the bug report would point to http://www.hixie.ch/advocacy/xhtml
and try to talk them into switching to HTML4 or something so they can
easier switch to HTML5 later.



Hmm, while that's in an interesting document, I'm not sure it presents
a clear mental model for authors. Lachlan wrote that the current
situation is clearer than Sam's proposal.

So far, WHAT-WG members have failed to write a one or two paragraph
bug report in clear English, with the target being the relatively
advanced HTML authors working on WordPress. Can it be done?

--

Robert Sayre

I would have written a shorter letter, but I did not have the time.


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Anne van Kesteren
On Wed, 29 Nov 2006 17:31:19 +0100, Mihai Sucan [EMAIL PROTECTED]  
wrote:
XML parsers break if the code has no trailing slashes where needed, the  
majority of HTML parsers do not break if the author uses trailing  
slashes.


Some web developers also make use, on the server, of XHTML and XML  
documents, which end up being sent to the UA - parts of, or entirely.  
Why trailing slashes need to break conformance? If trailing slashes are  
not accepted into HTML 5, then many other bad things should be banned.  
From the start, the error recovery should be eliminated, and treated as  
XML parsers do: stop on error, with no recovery.


That doesn't make sense at all. As said before, parse error does not  
mean that parsing has to stop, it merely indicates that a syntax error has  
to be flagged somewhere. Parsers are allowed to stop processing at that  
point, but that doesn't make sense for any parser that tries to collect  
data. Only for parsers validating data.



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Rimantas Liubertas

...

The trailing slash issue should be inexistent. Today many sites use this
trailing slash in HTML pages. Even if those pages do not validate today, I
consider they should validate, as long as they validate without the
trailing slashes.

...

I don't think that page claiming to be authored as HTML4.01 should
validate if it
contains br /, etc. which, at least in theory, has entirely different meaning.

Another point, maybe a bit off-topic:
http://annevankesteren.nl/2005/11-kurafire ;)


Regards,
Rimantas
--
http://rimantas.com/


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Ian Hickson
On Wed, 29 Nov 2006, Robert Sayre wrote:
 
 So far, WHAT-WG members have failed to write a one or two paragraph bug 
 report in clear English, with the target being the relatively advanced 
 HTML authors working on WordPress. Can it be done?

Please use HTML4 instead of XHTML1 in the output from WordPress blogs. 
The browser used by the majority of my readers doesn't support XHTML1, 
and is having to rely on error handling to handle your output.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Sam Ruby

Anne van Kesteren wrote:


What do you mean with implemented interoperably?


produce the same DOM

- Sam Ruby


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Lachlan Hunt

Mihai Sucan wrote:
Web developers want to be able to share code between XHTML and HTML 
projects.


Yes, some web developers want to do stupid things.  If you want to share 
data between HTML and XHTML, then do it properly.  Parse it in one form 
and re-serialise it in the other.  Don't just use string processing to 
do silly things like this:


xhtml = p + html + /p

--
Lachlan Hunt
http://lachy.id.au/


[whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Mark Baker

Hi.

HTML 5 says;

If the alternate keyword is used with the type attribute set to the
value application/rss+xml or the value application/atom+xml, then the
user agent must treat the link as it would if it had the feed keyword
specified as well.
 -- http://www.whatwg.org/specs/web-apps/current-work/#link-type

I believe this in error.  Atom, at least (I expect this also holds for
RSS), is useful for representing more things than just feeds.  It's
really a generic packaging mechanism.  For example, one might do the
equivalent of MHTML[1] using Atom, and link such a document to an HTML
page with rel=alternate.  But it isn't a feed, and it isn't
something you'd want syndication tools to auto-discover as a feed,
since that will just confuse users.

In addition, the media type on link is non-authoritative, meaning that
feed-semantics would be inferred before it was even ascertained that
the would-be representation was actually an Atom or RSS document.

Thanks.

[1] http://www.ietf.org/rfc/rfc2557.txt

Mark.


Re: [whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Ian Hickson
On Wed, 29 Nov 2006, Mark Baker wrote:
 
 HTML 5 says;
 
 If the alternate keyword is used with the type attribute set to the
 value application/rss+xml or the value application/atom+xml, then the
 user agent must treat the link as it would if it had the feed keyword
 specified as well.
  -- http://www.whatwg.org/specs/web-apps/current-work/#link-type
 
 I believe this in error.

It is intentional, as a way of grandfathering widespread legacy practice.

I agree that it is suboptimal. I'm not sure how to cater to both the 
existing content and, moving forward, to allow Atom to be used with 
rel=alternate to mean alternate representation that isn't a feed.


 But it isn't a feed, and it isn't something you'd want syndication tools 
 to auto-discover as a feed, since that will just confuse users.

Putting a real feed first would get around this, but you're right that in 
the case you described (and assuming no feed), there'd not really be a way 
to get around this other than simply not including the type= attribute.


 In addition, the media type on link is non-authoritative, meaning that 
 feed-semantics would be inferred before it was even ascertained that the 
 would-be representation was actually an Atom or RSS document.

Yeah. I think the spec is clear that the real MIME type overrides it once 
the file has been fetched; but again, existing practice constrains what we 
can do here.


In conclusion, I'm not sure we can do anything here. We're stuck between a 
rock and a hard place, as it were.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Henri Sivonen

On Nov 29, 2006, at 19:59, Ian Hickson wrote:


I'm not sure how to cater to both the
existing content and, moving forward, to allow Atom to be used with
rel=alternate to mean alternate representation that isn't a feed.


http://www.intertwingly.net/wiki/pie/PaceEntryMediatype

If it passes, of course.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Robert Sayre

On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote:


Ok, I have submitted a bug report.

http://trac.wordpress.org/ticket/3406

Let's see what happens.


Well, that didn't seem too effective. :/

--

Robert Sayre


Re: [whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Mark Baker

Hi Ian,

On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote:

On Wed, 29 Nov 2006, Mark Baker wrote:

 HTML 5 says;

 If the alternate keyword is used with the type attribute set to the
 value application/rss+xml or the value application/atom+xml, then the
 user agent must treat the link as it would if it had the feed keyword
 specified as well.
  -- http://www.whatwg.org/specs/web-apps/current-work/#link-type

 I believe this in error.

It is intentional, as a way of grandfathering widespread legacy practice.

I agree that it is suboptimal. I'm not sure how to cater to both the
existing content and, moving forward, to allow Atom to be used with
rel=alternate to mean alternate representation that isn't a feed.


What about documenting that some agents make that assumption, but not
prescribing that all agents must do so?

And to answer your other question, the proposed new media type for
Atom entry documents would only solve the problem for entries.  It
wouldn't solve them for the MHTML-like Atom document I described, nor
any other non-feed use of Atom... of which there most likely will be
many in the future.  If such a solution were used as precedent for
solving the problem for those uses of Atom, it would mean a new media
type for each use; a media type per link type, in fact.  Ouch!  So no,
I'm not a fan 8-)

Mark.


Re: [whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Mark Baker

On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote:

On Wed, 29 Nov 2006, Mark Baker wrote:

 What about documenting that some agents make that assumption, but not
 prescribing that all agents must do so?

The idea underlying the work here is to foster interoperability, not
document the lack of interoperability, so that isn't really an option.


When you're documenting age-old practice which is in widespread use, I
fully agree.  Feed autodiscovery is effectively brand new and not
widespread at all when compared to how widespread it should become in
20 years.  I think there's still lots of time to fix it.

Mark.


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Steve Runyon

Sorry for being the dunce here, but is anybody saying otherwise?  Whereas
XML _requires_ that you close every tag, HTML5 _should allow_ you to close
any tag.  I agree with what was said previously about considering something
like 'select //select' invalid, but if somebody's suggesting that
something like 'img src=... /' or 'br /' should also be invalid,
I disagree.  Validators and UAs should accept singleton tags _with or
without_ the self-closer.

Am I totally misunderstanding or missing the point here?


On 11/29/06, Leons Petrazickis [EMAIL PROTECTED] wrote:


On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote:
 On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote:
 
  Ok, I have submitted a bug report.
 
  http://trac.wordpress.org/ticket/3406
 
  Let's see what happens.

 Well, that didn't seem too effective. :/

This rigmarole is going to repeat on every site that has converted to
XHTML sent as text/html. People are emotionally invested in the idea
of trailing slashes. Websites have complex codebases, and going
through them removing trailing slashes on singleton elements would be
very hard.

They've already reaped all the benefits of XHTML -- cleaner, more
readable, more maintainable code. There's no incentive for them to
agree with you. This is a minor point that we need to give to them.

The very idea of HTML5 is to not demand that the Web be scrapped and
rewritten. We need the people who have rewritten all their pages so
that they validate on the W3C validator -- they have the fire and the
zeal and the will to spread our format. We need to make the migration
from invalid XHTML to valid HTML5 very, very easy for them. We can't
require them to dig through PHP spaghetti. And that means that, no
matter how it's achieved, br/ needs to be valid HTML5.
--
Leons Petrazickis



Re: [whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Ian Hickson
On Wed, 29 Nov 2006, Mark Baker wrote:
 
 When you're documenting age-old practice which is in widespread use, I 
 fully agree.  Feed autodiscovery is effectively brand new and not 
 widespread at all when compared to how widespread it should become in 20 
 years.  I think there's still lots of time to fix it.

I'm not sure what you're basing your assertion on; based on my own 
research of several billion documents, feed autodiscovery is used on 
hundreds of millions of pages, far beyond the point of no return in terms 
of backwards-compatibility constraints.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Julian Reschke

Lachlan Hunt schrieb:

...
The fact is that authors already try things like div/, p/ and even 
a/.  I've seen all of those examples in the wild.  See, for instance, 
the source of the XML 1.0 spec (and many others) which claim to be XHTML 
as text/html, littered with plenty of a/ tags all throughout.

...


Huh? The thing at http://www.w3.org/TR/REC-xml/? Don't see that 
problem there.


If this was the case at an earlier point of time, it was probably caused 
by a bug in their XSLT code, not the authors writing the spec (which 
IMHO uses the W3C's xmlspec XML language).


Best regards, Julian


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Steve Runyon

Thanks Ian - so is it fair to say that self-closing singletons should be
_allowed_ but not _required_ -- that either syntax would be accepted as
valid HTML5?  That only makes sense to me -- it's backward-compatible while
allowing XHTML compatibility as well.

Your point about 'p /test' being the same as 'ptest/p' is very
interesting.  That's not something I've ever done (that I'm aware of,
anyway), and it surprises me that it works that way.  As a divergent example
-- at least in IE6 -- 'div /' is treated as an inline element rather than
a block...that's probably non-standard behavior, and in any case it was a
surprise when I encountered it.

In case you can't tell, I haven't made it through the whole proposed spec
yet, so apologies if my questions and observations are springing from
ignorance.


On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote:


The argument is that the self-closer / is an XMLism, and that HTML5 has
nothing to do with XML, so there's no reason for it to apply here.

Note that in HTML, this:

  p/ test

...regardless of what this discussion results in, will always be treated
exactly the same as:

  p test /p

...because, for legacy reasons, there's no way we can treat / as a
self-closer in any tag other than void tags (like img or br).

--
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Julian Reschke

Anne van Kesteren schrieb:
On Wed, 29 Nov 2006 18:03:33 +0100, Julian Reschke 
[EMAIL PROTECTED] wrote:
The fact is that authors already try things like div/, p/ and 
even a/.  I've seen all of those examples in the wild.  See, for 
instance, the source of the XML 1.0 spec (and many others) which 
claim to be XHTML as text/html, littered with plenty of a/ tags all 
throughout.

...


Huh? The thing at http://www.w3.org/TR/REC-xml/? Don't see that 
problem there.


  h5a name=IDANQDS id=IDANQDS /Names and Tokens/h5

is one example...


If this was the case at an earlier point of time, it was probably 
caused by a bug in their XSLT code, not the authors writing the spec 
(which IMHO uses the W3C's xmlspec XML language).


In your humble opinion or is it just a fact? :-)


Aha. I thought it was about an a / with no attributes.

So yes, that's a bug in the XSLT code (xmlspec.xsl). I'll forward this 
info to Norman Walsh.


Best regards, Julian



Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread James Graham

Ian Hickson wrote:

On Wed, 29 Nov 2006, Leons Petrazickis wrote:
This rigmarole is going to repeat on every site that has converted to 
XHTML sent as text/html. People are emotionally invested in the idea of 
trailing slashes. Websites have complex codebases, and going through 
them removing trailing slashes on singleton elements would be very hard.


If people want to make HTML5 syntactically compatible with XHTML1, such 
that XHTML1 documents don't cause syntax errors in HTML5, we'll have to do 
a whole lot more than just allowing trailing /s. I don't really see why 
that would be a goal, though. Going further, if we want to make documents 
in general compliant with HTML5, then we've got our work cut out for us -- 
at least 78% of documents are syntactically incorrect today (not counting 
things like trailing /s in attributes, or missing DOCTYPEs -- if you 
include those, the number is more like 93%).


I tentatively support the idea that trailing slashes on singleton[1] 
elements should not be a parse error. I don't think it has any actual 
technical merit but I think it will be helpful in getting developer 
mindshare; a lot of people have drunk the Zeldman Koolaid and have the 
ideas of XHTML, clean markup, CSS, and conformance to standards in 
general all mushed together in their brain[2]. For these people (who I 
think represent the upper quartile of web developers in terms of 
commitment to good markup) the trailing slash in empty elements is the 
syntax of a new generation - it is a symbol that represents everything 
that has changed in web design since 1996 - as intrinsically useless as 
a fashionable designer label but just as seductive.


[1] I find that name quite confusing as it suggests there should only be 
one in the entire document.


[2] c.f. the code is poetry comment in the Wordpress bug report 
despite the fact that most here would argue HTML 4 as text/html is 
considerably more poetic than XHTML as text/html.


--
The universe doesn't care what you believe. The wonderful thing about 
science is that it doesn't ask for your faith, it just asks for your 
eyes --- http://xkcd.com/c154.html


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-29 Thread Robert Sayre

On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote:

On 11/29/06, Robert Sayre [EMAIL PROTECTED] wrote:

 Ok, I have submitted a bug report.

 http://trac.wordpress.org/ticket/3406

 Let's see what happens.

Well, that didn't seem too effective. :/


Ah, if you visit now, you'll find a WHAT-WG member has written
fundamental flaw with the way WordPress has been built in bright red
letters. Not exactly Dale Carnegie material.

It still seems impossible to file a bug on teXtHTML.

Sam Ruby wrote:


Drawing lines in the sand and maintaining that br / is invalid is only 
going to make more
busy work for a lot of people.  If you try to explain why this decision was 
made, most won't
understand, and eventually most will decide that compliance isn't worth the 
bother.


Agree.

--

Robert Sayre


Re: [whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Ian Hickson
On Wed, 29 Nov 2006, Mark Baker wrote:
 On 11/29/06, Ian Hickson [EMAIL PROTECTED] wrote:
  On Wed, 29 Nov 2006, Mark Baker wrote:
  
   When you're documenting age-old practice which is in widespread use, 
   I fully agree.  Feed autodiscovery is effectively brand new and not 
   widespread at all when compared to how widespread it should become 
   in 20 years.  I think there's still lots of time to fix it.
  
  I'm not sure what you're basing your assertion on; based on my own 
  research of several billion documents, feed autodiscovery is used on 
  hundreds of millions of pages, far beyond the point of no return in 
  terms of backwards-compatibility constraints.
 
 I wouldn't call that a very good metric for the purposes of this 
 discussion though, because I expect that the bulk of those pages are 
 produced by a handful of blog hosting services.  If we can shrink 100s 
 of millions by 4 or 5 (or more) orders of magnitude with a handful of 
 persuasively written emails, then the situation is not what I would call 
 widespread.

It's widespread _today_, such that UAs today can't change their behaviour. 
Thus we can't change the spec today.

If you reduced the volume of such usage, then it would be worth 
revisiting, but unless that happens, we're merely talking hypotheticals.

Personally I wouldn't be optimistic about the ability to change the legacy 
data; historically it has not been possible. I don't really know of any 
successful attempt, to the point where browsers historically have even 
tried using different processing modes -- the whole quirks mode thing -- 
to get around legacy content incompatible with the specifications.


 Are you able to analyze what proportion of those pages are hosted by the 
 top, say, 10 hosters?

Not from my current data set, no.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Inferring rel=feed from the media type

2006-11-29 Thread Ian Hickson
On Wed, 29 Nov 2006, James M Snell wrote:
 
 Is HTML5 intended to be a description of the Way Things Are or a 
 description of the Way Things Ought To Be?

It's a description of what browsers should implement if they want to be 
compatible with legacy content while supporting new features.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Rel alternate, stylesheet and feed

2006-11-29 Thread Lachlan Hunt

Ian Hickson wrote:

On Wed, 29 Nov 2006, Lachlan Hunt wrote:
The spec defines special handling for rel=alternate stylesheet, but 
also defines that alternate with type=application/atom+xml or 
type=application/rss+xml implies the feed relationship.  Does this 
represent an alternate stylesheet, a syndication feed or both?


If alternate and stylesheet are both specified, the alternate 
keyword doesn't imply feed, because the rest of this subsection doesn't 
apply if it says both. So feed isn't implied.



link rel=alternate stylesheet type=application/atom+xml href=/feed
title=Blog Entries

Firefox 2 [...] recognised it as a feed.


Thanks, bug filed.

https://bugzilla.mozilla.org/show_bug.cgi?id=362329

--
Lachlan Hunt
http://lachy.id.au/


Re: [whatwg] Element content models

2006-11-29 Thread Michael(tm) Smith
Anne van Kesteren [EMAIL PROTECTED], 2006-11-26 12:58 +0100:

 Some element content model explicitly mention that they can't contain  
 themself. This probably makes sense for the following elements as well:
 
 * meter
 * progress
 * time
 * t
 * m
 * abbr?
 * cite?
 
 There might be more.

annotations (footnotes, endnotes, marginalia) and acronym

  --Mike