While there are good reasons the authors of HTML5 brought to ignore SCSU or BOCU-1, having excluded UTF-32 which is the most direct, one-to-one mapping of Unicode codepoints to byte values seems shortsighted. We are talking about the whole of Unicode, not just BMP.
/Sz On Sat, Apr 28, 2012 at 21:48, Doug Ewell <[email protected]> wrote: > <anbu at peoplestring dot com> wrote: > > What are some of the reasons a new encoding will face challenges? >> > > The main challenge to a new encoding is that UTF-8 is already present in > numerous applications and operating systems, and that any encoding intended > to serve as an alternative, let alone a replacement UTF-8, must be "better > enough" to justify re-engineering of these systems. > > Some people are simply opposed to additional encoding schemes. The HTML5 > specification explicitly forbids the use of UTF-32, SCSU, and BOCU-1 (while > allowing many non-Unicode legacy encodings and quietly mapping others to > Windows encodings); one committee member was quoted as saying that other > encodings of Unicode "waste developer time." > > Any encoding that does not align code point boundaries along byte > boundaries will be criticized for requiring excessive processing. The > argument that I made will be made by others, that if it necessary to > process bit-by-bit, one might as well use a general-purpose compression > algorithm. It is popular to present gzip as the ideal compression approach, > since it is widely available, especially on Linux-type systems, and > publicly documented (and not IP-encumbered). > > I may have missed some other objections. > > > -- > Doug Ewell | Thornton, Colorado, USA > http://www.ewellic.org | @DougEwell > >

