While there are good reasons the authors of HTML5 brought to ignore SCSU or
BOCU-1, having excluded UTF-32 which is the most direct, one-to-one mapping
of Unicode codepoints to byte values seems shortsighted. We are talking
about the whole of Unicode, not just BMP.

/Sz



On Sat, Apr 28, 2012 at 21:48, Doug Ewell <[email protected]> wrote:

> <anbu at peoplestring dot com> wrote:
>
>  What are some of the reasons a new encoding will face challenges?
>>
>
> The main challenge to a new encoding is that UTF-8 is already present in
> numerous applications and operating systems, and that any encoding intended
> to serve as an alternative, let alone a replacement UTF-8, must be "better
> enough" to justify re-engineering of these systems.
>
> Some people are simply opposed to additional encoding schemes. The HTML5
> specification explicitly forbids the use of UTF-32, SCSU, and BOCU-1 (while
> allowing many non-Unicode legacy encodings and quietly mapping others to
> Windows encodings); one committee member was quoted as saying that other
> encodings of Unicode "waste developer time."
>
> Any encoding that does not align code point boundaries along byte
> boundaries will be criticized for requiring excessive processing. The
> argument that I made will be made by others, that if it necessary to
> process bit-by-bit, one might as well use a general-purpose compression
> algorithm. It is popular to present gzip as the ideal compression approach,
> since it is widely available, especially on Linux-type systems, and
> publicly documented (and not IP-encumbered).
>
> I may have missed some other objections.
>
>
> --
> Doug Ewell | Thornton, Colorado, USA
> http://www.ewellic.org | @DougEwell ­
>
>

Reply via email to