[whatwg] Drop UTF-32

Michael Day Tue, 15 May 2007 02:36:13 -0700

Hi,

Suggestion: drop UTF-32 from the character encoding detection section ofHTML5, and even better, discourage of forbid user agents fromimplementing support for UTF-32.


Why:

- It's not widely used. In fact, has UTF-32 ever been used at all,outside of test suites?

- It's not widely implemented. For example, the expat XML parser doesnot support it, and nobody cares.

- When it is supported, people get it wrong, and the bugs are neverfixed because no one uses UTF-32 anyway and no one cares.

For an example of this, see html5lib 0.9, which implements the BOMdetection algorithm, but gets it wrong by checking for UTF-16 beforechecking for UTF-32. Because the UTF-16 BOM (FF FE) is a substring ofthe UTF-32 BOM (FF FE 00 00) the test will always succeed and UTF-32will always be misidentified as UTF-16. But no one cares, as no one usesUTF-32 anyway.

- UTF-32 is horrendously inefficient for just about all real worldtext and its use should not be encouraged on the web. Really, UTF-32only exists as a tutorial example of how UNICODE can be encoded, not asa practical character encoding that people should actually use.

Please, drop UTF-32 and save implementors from worrying about it when noone uses it and no one should use it.


Thanks,

Michael

--
Print XML with Prince!
http://www.princexml.com

[whatwg] Drop UTF-32

Reply via email to