Philip Taylor wrote:
Shelley Powers wrote:
Philip Taylor wrote:
[...]
A survey of random pages from dmoz.org about a year ago found that
~18% used an XHTML doctype, and ~0.03% were served as
application/xhtml+xml. On the Alexa top 200 a bit earlier
(http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html),
a third used an XHTML doctype and three quarters of those were not
well-formed XML. So: Any new markup will be overwhelmingly served as
text/html, and most of it that claims to be XHTML won't be usable
with an XML parser.
Thus, the XHTML syntax will mostly be processed using the
RDFa-in-text/html processing rules. If those rules don't do what
people expect (after they've read the XHTML-focused specs and guides
and tutorials and examples), then they will be surprised and unhappy
and it will be a bad situation.
[...]
Can I take a leap of faith and guess that of the 18% of web pages
served up with the XHTML doctype not using well formed XML probably
are also not using RDFa?
They aren't, because approximately no pages (regardless of doctype or
well-formedness) are using RDFa. Looking at some more recent data
(~425000 pages from http://www.dotnetdotcom.org/ collected in the past
few months), about 0.04% of pages in the sample appear to contain RDFa
attributes (specifically 'property' containing a colon).
But I presume the idea is for RDFa to become much more widely used,
and I have no reason to doubt that it would end up with roughly the
same spread of text/html vs application/xhtml+xml and well-formed vs
ill-formed, so the numbers are still relevant.
But we're addressing two things here: what do we do with what we have
now, and how will we move into the future?
If none of these pages were using RDFa (or so small as to be irrelevant)
then we're not "breaking" the web by insisting on following HTML
processing rules when it comes to RDFa in HTML, while still preserving
existing XHTML rules for RDFa in XHTML. And we wouldn't be breaking the
web, anyway, because RDFa was released for XHTML -- use in HTML pages at
somewhat your own risk.
That's not being mean to the web designer/developer/Uncle Joe and his
page on bowling balls. It's not holding the web back because of edge
cases of undocumented, or unsupported uses.
And thankfully, Google used all lowercase prefixes.
Shelley