The question shall read as: What are some of the reasons a new encoding will face challenges?
-------- Original Message -------- Subject: Re: Fwd: Re: Unicode, SMS and year 2012 Date: Sat, 28 Apr 2012 15:32:47 -0400 From: <[email protected]> To: <[email protected]> > There are many reasons why a new encoding that is merely more efficient > than UTF-8, especially one that sacrifices byte-based processing or > other design features, will face a severe uphill battle in trying to > displace UTF-8. What are some of the reasons a new encoding will face? On Sat, 28 Apr 2012 13:15:48 -0600, "Doug Ewell" <[email protected]> wrote: > <anbu at peoplestring dot com> wrote: > >> This clearly shows that my design yields number of values more than >> double that of UTF8 > > I didn't know we were competing against UTF-8 on efficiency. That's > easy. UTF-8 is not at all guaranteed to be the most efficient encoding > possible, or even reasonably possible. It was originally scoped to be > "not extravagant" in terms of space, while providing other design > features like byte boundaries, full ASCII transparency, easy detection, > and prefixes that quickly indicate the length of the sequence. > > It's easy to beat the efficiency of UTF-8 in a byte-based encoding, if > many of its other design features are ignored: > > 0xxxxxxx - encodes U+0000 through U+007F > 1xxxxxxx 0xxxxxxx - encodes U+0080 through U+3FFF > 1xxxxxxx 1xxxxxxx - encodes U+4000 through U+10FFFF > (and onward to 0x1FFFFF) > > This is a well-known and freely available technique, sometimes called > "self-delimiting numeric values" (RFC 6256) and sometimes by other > names. > > There are many reasons why a new encoding that is merely more efficient > than UTF-8, especially one that sacrifices byte-based processing or > other design features, will face a severe uphill battle in trying to > displace UTF-8. > > -- > Doug Ewell | Thornton, Colorado, USA > http://www.ewellic.org | @DougEwell

