2012/7/16 Leif Halvard Silli <[email protected]>: > <html> element, then Chrome will sniff it as UTF-8 encoded. Whereas IE, > Webkit, Opera, Firefox will default to ISO-8858-1/Windows-1252.
Actually ISO 885**9**-1. But we've also been told that, given the C1 controls are simply invalid for HTML, even if a site indicates ISO-8859-1, it will be interpreted as Windows-1252 (meaning there were will remain a few unassigned byte values that are invalid, causing the HTML parser to try other encodings if they are found, but not UTF-8 which will be invalid there too and that could as well raise exceptions). Most of these exceptions however will just be remapped to the U+FFFD replacement character. The support of legacy encodings is now more restrictive in HTML5 which only supports UTF-8 and Windows-1252, plus a few other encodings (ASCII is considered now an alias of Windows-1252, also for compatibiluty reasons, even if strict US-ASCII resources could be interpreted without changes as UTF-8) and require explicit encoding (sniffing no longer works for something else as UTF-8 for its leading BOM interpreted as a data signature and not as a character)

