For Java, the support for supplementary characters is actually better than one might 
think.

It is true that the char type and the Character class only support 16-bit code units. 
However, storing UTF-16 strings in String objects and char[] arrays and passing code 
points as int's in non-JDK APIs works just fine.

The JDK layout engine (which is shared with ICU4C) can display UTF-16 text that 
includes supplementary characters.

Some JDK converters, where necessary, convert supplementary characters. There will be 
a GB 18030 converter, for example. Note that the IBM JDK has fixpacks going back at 
least to 1.3 if not 1.2.2 to add GB 18030 support.

Also, if you get ICU4J, then you can use the UCharacter class from there which uses 
int types for code points. ICU4J 2.0 will soon come out of the box with Unicode 3.1 
properties data. Watch http://oss.software.ibm.com/icu4j/ (You can build such a 
properties file already with ICU4C's genprops tool.)


Changing the string storage in Java fundamentally from UTF-16 to UTF-32 is impossible 
with the legacy of Java and JNI code out there. All indexing, length counting, use of 
char as integer types, JNI getString(), etc. would be broken, and interfacing with 
major operating systems, browsers, and other software would suddenly be more 
complicated and require UTF re-transformation. Bad idea.


Best regards,
markus

Reply via email to