At 08:38 AM 3/26/2002, Addison Phillips [wM] wrote: >The downside is that the GUI stuff, Swing and AWT, don't recognize >surrogates properly. Paste U+D800 U+DC00 into a Swing control and you'll >see TWO hollow boxes, not one... the JDK is rendering the characters >separately. (NB> I haven't tried this test with 1.4, so there may be more >support there for surrogates). > >So, using ICU you can probably do some of the processing you're interested >in. But GUI apps are going to be very problematic until Swing or AWT are fixed.
JDK 1.4 can render characters coded as surrogate pairs. This works in AWT and Swing. >Hope that helps. > >Addison > >Addison P. Phillips >Globalization Architect / Manager, Globalization Engineering >webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA >+1 408.962.5487 (phone) +1 408.210.3659 (mobile) >------------------------------------------------- >Internationalization is an architecture. It is not a feature. > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > > Behalf Of Ben Monroe > > Sent: 2002年3月26日 0:17 > > To: Unicode list > > Subject: accessing extended ranges > > > > > > I would like to access some of the characters from "CJK Unified Ideographs > > Extension B." These are all in the range of 20000-2A6DF. (direct link: > > http://www.unicode.org/charts/PDF/U20000.pdf ) > > > > "Basic Latin" appears in 0000-007F range. The original "CJK Unified > > Ideographs" all appear within the 4E00–9FAF range. These are all easy to > > access with U+xxxx (4 x's). In Java, the format /uxxxx works just > > fine (and > > also the same for http://www.macchiato.com/unicode/ ). However, how do you > > access the characters in the larger ranges (ie, U+xxxxx or /uxxxxx)? > > > > Directly using the 5 value format /uxxxxx produces are Unicode character > > followed by the 5th x. Here is a quick example: > > > > public class UniStringTest { > > static public void main(String[] args) { > > String s1 = "\u963F"; // displays fine; standard /uxxxx (4x's) > > System.out.println(s1); > > String s2 = "\u9FA0"; // also displays fine; standard /uxxxx (4x's) > > System.out.println(s2); > > String s3 = "\u2A6A5"; // biggest character that I know (5x's) but > > doesn't process > > System.out.println(s3); > > } > > } Note that the Java "\u" notation always uses four digits. The last string in your code is interpreted as the character U+2A6A followed by "5" (U+0035). The correct way to write this in Java is to use surrogate paris: String s3 = "\uD869\uDEA5"; // surrogate pair for U+2A6A5 > > Thanks, > > > > Ben Monroe Eric Mader IBM GCoC San Jos� 5600 Cottle Rd M/S 50-2/B11 San Jose, CA 95193 > > > > > >

