Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

DougEwell2 Mon, 19 Feb 2001 18:01:20 -0800

A few days ago I said there was a "widespread belief" that Unicode is a 
16-bit-only character set that ends at U+FFFF.  A corollary is that the 
supplementary characters ranging from U+10000 to U+10FFFF are either 
little-known or perceived to belong to ISO/IEC 10646 only, not to Unicode.

At least one list member questioned whether this belief was really widespread.

Here is an example from the help file for Character Map in Microsoft Windows 
2000.  Visit "Character Map overview" and click on the word "Unicode" to see 
the following definition:

"A 16-bit character encoding standard developed by the Unicode Consortium 
between 1988 and 1991.  By using two bytes to represent each character, 
Unicode enables almost all of the written languages of the world to be 
represented using a single character set.  By contrast, 8-bit ASCII is not 
capable of representing all of the combinations of letters and diacritical 
marks that are used just with the Roman alphabet.

"Approximately 39,000 of the 65,536 possible Unicode character codes have 
been assigned to date, 21,000 of them being used for Chinese ideographs.  The 
remaining combinations are open for expansion.

"See also ASCII."

Exercise for the reader:  See how many misstatements about Unicode (and 
ASCII) you can find in this text.

-Doug Ewell
 Fullerton, California

Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

Reply via email to