2012/11/29 Leif Halvard Silli <[email protected]> > Philippe Verdy, Thu, 29 Nov 2012 14:24:29 +0100: > ... > > But why ? Isn't UTF-8 (or alternatively UTF-16) already the default > > encoding of XHTML? > > > > If not, then we should file a bug in the W3C Validator for not honoring > the > > Guideline 9 (even though it is not part of the standard itself, but just > a > > recommendation, it should issue at least a warning). > > This is exactly the problem. Your "if not" does apply! Because, if one > presents a XHTML document to the browser as HTML, then then > windows-1252 - and not UTF-8 - becomes the default encoding. And, in > fact, as consequence of our dialog, I have notified the developers of > Unicorn about the shortcoming, asking them to issue a warning. >
Thanks a lot, this was really hard to see and understand, because I was only reading the XHTML specs, and the Validator did not complain. As a side note, the Unicorn Validator which "senses" the content-type (in its simple interface) will still sense an XHTML content which remains valid by itself. The issue is only when it is presented as HTML, and this validator should allow seeing the effect when using HTML parsers (HTML4 or HTML5) on XHTML documents, by offering the way to select another document type than the autodetected one (XHTML here), if ever the warning is displayed. Because the XHTML document may not validate at all when parsed as HTML (in which case it will first issue warnings about the presence of XML prologs (which are generally not a problem as they are typically ignored in browsers), but an error about XML processing instructions (I don't think that the optional leading XML declaration is a "processing instruction"), or an error about non-conforming document declaration (according to the selected HTML flavor: HTML4 or HTML5. Anyway, we can expect this page design error will be frequent, and HTML5 parsors should still better not discard the XML declaration, but at least recognize its encoding pseudo-attribute (even if the processing continues using HTML rules and not XML rules), instead of relying on the presence of the meta element, which is really ugly and forces the reparsing using the detected encoding instead of the default windows-1252 (this is unnecessarily slow). Making this "Guideline 9" only applicable to past flavors of HTML before HTML5 when it will be released. In that case the warning issued by the Validator would only apply to HTML5 or before, but not HTML5. This will increase the comparibility of HTML5 to parse valid XHTML1 and XHTML5 documents simply created or modified by XML or XHTML editors.

