On Wed, Jul 17, 2002 at 04:17:15PM +0100, Nicholas Clark wrote:
> My understanding was that Unicode has now escaped the base plane (or whatever
> it's called) and now has started using code points >65536. How does Java
> cope with this?
This is getting a little off-topic, I think.  But here's a brief overview
of the Unicode codespace size issue  - if you have any more questions,
you can ask me off-list.

There were originally two separate universal character set efforts,
by the ISO and the Unicode Consortium.  They decided early on to
combine their efforts and be mutually compatible. 

However, ISO-10646 was designed as a 32-bit code, consisting
of 65,536 16-bit "planes", while Unicode was only 16 bits. 
So Unicode is identical to plane 0 of ISO-10646, called the
Basic Multilingual Plane (BMP).  So far, the ISO has no characters
defined outside of this plane.  

It does plan to define some eventually, however (in ISO-10646-2), and
this is handled in Unicode through a section of the code space called
"surrogates", which are used in the UTF-16 encoding to reach planes
1-16 of ISO-10646.

ISO has no plans to define characters outside of planes 1-16 anytime
in the foreseeable future (or, indeed, outside of planes 1-14, since
15 and 16 are reserved for private use).

-- 
Mark REED                    | CNN Internet Technology
1 CNN Center Rm SW0831G      | [EMAIL PROTECTED]
Atlanta, GA 30348      USA   | +1 404 827 4754 
--
The end of the world will occur at three p.m., this Friday, with
symposium to follow.

Reply via email to