Philippe Verdy, Tue, 17 Jul 2012 03:40:37 +0200: > 2012/7/16 Leif Halvard Silli:
HTML5: > (ASCII is considered now an alias of Windows-1252, also for > compatibiluty reasons, even if strict US-ASCII resources could be > interpreted without changes as UTF-8) I agree that HTML5 ought to ask UAs to, more aggressively, try to detect UTF-8. And an argument was put forward in the WHATWG mailinglist earlier tis year/end of previous year, that a page with strict ASCII characters inside could still contain character entities/references for characters outside ASCII. For instance, early on in 'the Web', some appeared to think that all non-ASCII had to be represented as entities. > and require explicit encoding > (sniffing no longer works for something else as UTF-8 for its leading > BOM interpreted as a data signature and not as a character) If that was true, then Firefox' (for most locales) optional character encoding detector would not be compatible with HTML5. And also, Chrome would violate HTML5. I do not think that HTML5 rules out detection of encodings that HTML5 permit/requires UAs to support. However, the encoding sniffing algorithm specifies at which stages in the sniffing process the 'try-to-guess' step should happen. -- Leif Halvard Silli

