RE: accessing extended ranges

Eric Mader Wed, 27 Mar 2002 12:05:47 -0800

At 08:38 AM 3/26/2002, Addison Phillips [wM] wrote:
>The downside is that the GUI stuff, Swing and AWT, don't recognize 
>surrogates properly. Paste U+D800 U+DC00 into a Swing control and you'll 
>see TWO hollow boxes, not one... the JDK is rendering the characters 
>separately. (NB> I haven't tried this test with 1.4, so there may be more 
>support there for surrogates).
>
>So, using ICU you can probably do some of the processing you're interested 
>in. But GUI apps are going to be very problematic until Swing or AWT are fixed.


JDK 1.4 can render characters coded as surrogate pairs. This works in AWT 
and Swing.

>Hope that helps.
>
>Addison
>
>Addison P. Phillips
>Globalization Architect / Manager, Globalization Engineering
>webMethods, Inc.  432 Lakeside Drive, Sunnyvale, CA
>+1 408.962.5487 (phone)  +1 408.210.3659 (mobile)
>-------------------------------------------------
>Internationalization is an architecture. It is not a feature.
>
>
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> > Behalf Of Ben Monroe
> > Sent: 2002年3月26日 0:17
> > To: Unicode list
> > Subject: accessing extended ranges
> >
> >
> > I would like to access some of the characters from "CJK Unified Ideographs
> > Extension B." These are all in the range of 20000-2A6DF. (direct link:
> > http://www.unicode.org/charts/PDF/U20000.pdf )
> >
> > "Basic Latin" appears in 0000-007F range. The original "CJK Unified
> > Ideographs" all appear within the 4E00–9FAF range. These are all easy to
> > access with U+xxxx (4 x's). In Java, the format /uxxxx works just
> > fine (and
> > also the same for http://www.macchiato.com/unicode/ ). However, how do you
> > access the characters in the larger ranges (ie, U+xxxxx or /uxxxxx)?
> >
> > Directly using the 5 value format /uxxxxx produces are Unicode character
> > followed by the 5th x. Here is a quick example:
> >
> > public class UniStringTest {
> >   static public void main(String[] args) {
> >     String s1 = "\u963F"; // displays fine; standard /uxxxx (4x's)
> >     System.out.println(s1);
> >     String s2 = "\u9FA0"; // also displays fine; standard /uxxxx (4x's)
> >     System.out.println(s2);
> >     String s3 = "\u2A6A5"; // biggest character that I know (5x's) but
> > doesn't process
> >     System.out.println(s3);
> >     }
> > }

Note that the Java "\u" notation always uses four digits. The last string 
in your code is interpreted as the character U+2A6A followed by "5" 
(U+0035). The correct way to write this in Java is to use surrogate paris:

         String s3 = "\uD869\uDEA5"; // surrogate pair for U+2A6A5

> > Thanks,
> >
> > Ben Monroe

Eric Mader
IBM GCoC San Jos�
5600 Cottle Rd M/S 50-2/B11
San Jose, CA 95193

> >
> >
> >

RE: accessing extended ranges

Reply via email to