[EMAIL PROTECTED] wrote:
"Unicode is a character set encoding standard which currently provides for
its entire character repertoire to be represented using 8-bit, 16-bit or
32-bit encodings."
Please say "encoding forms".
There are three distinct terms, that sound similar, and apparently cause
confusion. All of them use the word "encoding".
Unicode is a "character set encoding", a mapping between code points
and abstract characters. Code points are roughly 21 bits and have
nothing to do with 8, 16, or 32 bit values.
The Unicode Standard defines several "encoding forms", which are rules
for representing code points as sequences of integers. UTF-8 uses
sequences of 8-bit integers, etc. These are used mostly in string
representations, file contents, and serializations for transmission.
The Unicode standard defines several "encoding schemes", which
are encoding forms, together with a choice of byte order, for those
encoding forms where byte order matters (UTF-16 and UTF-32, currently).
In my opinion, it is worth being fussy about the distinctions between
the three terms. If, thinking mostly about UTF-16 strings, you say
"Unicode has a 16-bit encoding" to someone just learning about the
character set, they are likely to think they can use 2^16 size
arrays, bitsets, and integer ranges to manipulate character data,
character sets, and characters -- without realizing the compromise
that decision entails. That was my initial reaction, anyway.
"Confusion has its cost" -- Crosby, Stills, Nash, and Young
Raised by a roving pack of wild, pedantic mathematicians,
Thomas Lord
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Peter_Constable
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Cathy Wissink
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Marco Cimarosti
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Joel Rees
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Tom Lord
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Marco Cimarosti
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Peter_Constable
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Peter_Constable
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Peter_Constable
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Carl W. Brown
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Tom Lord
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Tex Texin
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Peter_Constable
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Joel Rees
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Joel Rees
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Peter_Constable
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Ayers, Mike
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... John Cowan
- Re: Perception that Unicode is 16-bit (was: Re: Surrogate... Tex Texin
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Ayers, Mike
- RE: Perception that Unicode is 16-bit (was: Re: Surrogate... Kenneth Whistler

