- Method 1 (the BOM) is only goof for UTF-16. not reliable for UTF-8 whuch is still the default for XHTML (and where the BOM is not always present). - Method 2 is working sometimes, but is not practicle for many servers that you can't configure to change their content-type for specific pages all having the same *.html extension or relayed by some proxies, it is also dependant on the transport layer (HTTP here) to be capbel of offering it (HTML files in file systems do not provide the info). Bit if it is implemented it will take precedence, possibly indicating that the document was reencoded (by a proxy for example). - Method 3 and 4 are completely equivalent and share the same problem : they require restarting the parsing. They are equally ugly (just like all empty meta elements in the HTML header or in the body) intriducing another attribute to the meta element (which already has name, http-equiv, and now charset) is also a bad idea (data encoded in attributes that are part of the document root, breaks the concept of what is metadata); it also forbids the reencoding of the document during processing, if the document is digitally signed for its content, independantly of its encoding: to check the document signature, you would not only have to parse it completely up to the DOM level, but also ignore these specific meta elements (but not all meta elements like links)
- Method 5 is where ? - Method 6 (sniffing) is a transitory solution (as long as HTML5 is not released) or last chance paliative solution based only on an heuristic, which fails sometimes. Not reliable. - Method 7 (using the XML prolog) is excellent for XML. It will reliably work with XHTML5, without needing reparsing. - Method 8 (content-type set as "application/xhtml+xml" in the transport layer) is exactly like method 2 (and suffers the same problem), but the content-type is not really intended for HTML5, not even XHTML5 as it implies an application and the extensible schema that XHTML5 will not parse. Method 8 for me implies the forced use of an XML parser, not an HTML parser. All XML extensions (including namespaces) will be valid My method is a generalisation to HTML of the excellent method 7 for XHTML (based on its standard and the XML standard). It requires absolutely no reparsing, and supports the explicit versioning of HTML (for future evolutions of its supported schema), without overwriting the independant versioning of XML if it is used. As well it does not require the new ugly DOCTYPE which indicates absolutely nothing signiicant, will not allow versioning, and breaks SGML parsers as well as XML parsers. It takes benefit of the fact that they don't break browsers in method 7 (even if some of them do not sniff at least the encoding from the XML prolog). 2012/11/29 Leif Halvard Silli <[email protected]> > Philippe Verdy, Thu, 29 Nov 2012 16:10:14 +0100: > > Thanks a lot, this was really hard to see and understand, because I > > was only reading the XHTML specs, and the Validator did not complain. > > Glad to find we are no the same page! > > Philippe Verdy, Thu, 29 Nov 2012 16:27:13 +0100: > > <?html version="5.0" encoding="utf-8"> > > HTML5 already have 4 *conforming* methods for setting the UTF-8 > encoding: > > 1. byte-order mark > 2. HTTP server, > Content-Type:text/html;charset=UTF-8 > 3. meta http-equiv, > <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/> > 4. meta charset, > <meta charset="UTF-8"/> > (Note that there is no content-type here, and thus the meta charset > method is more "clean" to use in a file served as XHTML.) > > In addition, other things have effect: > > 6. Sniffing is an official, but largely unimplemented method for > getting the encoding (Chrome and Opera use it, and Firefox > has it as an option and also uses it by default for some locales.) > 7. The XML prologue (sic) takes effect in *some* browsers. > 8. Simply serving the page as application/xhtml+xml is > yet another method of setting the encoding to UTF-8. > > Thus I can guarantee you that your idea about at method number 9, is > not going to be met with enthusiasm. > -- > leif halvard silli >

