Re: Fwd: Re: Unicode, SMS and year 2012

anbu Sat, 28 Apr 2012 12:38:29 -0700

> There are many reasons why a new encoding that is merely more efficient 
> than UTF-8, especially one that sacrifices byte-based processing or 
> other design features, will face a severe uphill battle in trying to 
> displace UTF-8.


What are some of the reasons a new encoding will face?

On Sat, 28 Apr 2012 13:15:48 -0600, "Doug Ewell" <[email protected]>
wrote:
> <anbu at peoplestring dot com> wrote:
> 
>> This clearly shows that my design yields number of values more than
>> double that of UTF8
> 
> I didn't know we were competing against UTF-8 on efficiency. That's 
> easy. UTF-8 is not at all guaranteed to be the most efficient encoding 
> possible, or even reasonably possible. It was originally scoped to be 
> "not extravagant" in terms of space, while providing other design 
> features like byte boundaries, full ASCII transparency, easy detection, 
> and prefixes that quickly indicate the length of the sequence.
> 
> It's easy to beat the efficiency of UTF-8 in a byte-based encoding, if 
> many of its other design features are ignored:
> 
> 0xxxxxxx - encodes U+0000 through U+007F
> 1xxxxxxx 0xxxxxxx - encodes U+0080 through U+3FFF
> 1xxxxxxx 1xxxxxxx - encodes U+4000 through U+10FFFF
> (and onward to 0x1FFFFF)
> 
> This is a well-known and freely available technique, sometimes called 
> "self-delimiting numeric values" (RFC 6256) and sometimes by other 
> names.
> 
> There are many reasons why a new encoding that is merely more efficient 
> than UTF-8, especially one that sacrifices byte-based processing or 
> other design features, will face a severe uphill battle in trying to 
> displace UTF-8.
> 
> --
> Doug Ewell | Thornton, Colorado, USA
> http://www.ewellic.org | @DougEwell

Re: Fwd: Re: Unicode, SMS and year 2012

Reply via email to