RE: Java and Unicode

Marco . Cimarosti Wed, 15 Nov 2000 08:34:54 -0800
Eliotte Rusty Harold wrote:

> One thing I'm very curious about going forward: Right now character 
> values greater than 65535 are purely theoretical. However this will 
> change. It seems to me that handling these characters properly is 
> going to require redefining the char data type from two bytes to 
> four. This is a major incompatible change with existing Java.
> (...)

John O'Conner just wrote something about surrogates
(http://www.unicode.org/unicode/faq/utf_bom.html#16) and UTF-16
(http://www.unicode.org/unicode/faq/utf_bom.html#5) in Java, but your
message was probably already on its way:

> You can currently store UTF-16 in the String and StringBuffer 
> classes. However,
> all operations are on char values or 16-bit code units. The 
> upcoming release of
> the J2SE platform will include support for Unicode 3.0 (maybe 3.0.1)
> properties, case mapping, collation, and character break 
> iteration. There is no
> explicit support for surrogate pairs in Unicode at this time, 
> although you can
> certainly find out if a code unit is a surrogate unit.
> 
> In the future, as characters beyond 0xFFFF become more 
> important, you can
> expect that more robust, official support will ollow.
> 
> -- John O'Conner

_ Marco
RE: Java and Unicode

Reply via email to