Markus, My mistake. I should have checked the docs but I thought that AIX used a 16 bit wchar_t. In any case it still takes forever to change things in an OS. There is so much interrelated code that fixing one thing can brake another.
Carl > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:unicode-bounce@;unicode.org]On > Behalf Of Markus Scherer > Sent: Thursday, November 14, 2002 9:18 AM > To: unicode > Subject: Re: IBM AIX 5 and GB18030 > > > Carl W. Brown wrote: > > Some Unix systems adapted faster because the later Unicode > adopters used 32 > > bit Unicode characters making the job 100 times easier. Other companies > > like Microsoft took a very big gamble and implemented the code > for surrogate > > support into Windows 2000 based on early drafts of the Unicode > standard. If > > they had not done it this way or had guessed wrong they might > not even have > > support in Windows XP. > > Hi Carl, I am not going to argue with you on what you say about > ICU :-) but I am not sure about your > Unix comments. > > First, AIX 5 uses 32-bit wchar_t, which is UTF-32 except for the > zh_TW locale, as far as I know. > (AIX 5 zh_TW uses a different wchar_t encoding.) > > Again as far as I know, Unix/Linux systems chose to use 32-bit > wchar_t not because of great > strategic plans or compelling performance analysis, but because > the existing C stdlib functions for > wchar_t string handling assume that the single-code-point type is > the same as the string base unit. > This one design point requires 32-bit wchar_t not just for > Unicode but also for the character sets > of EUC-TW and GB18030. > > You seem to suggest that there is a problem with 16-bit Unicode. > It does take some effort to adapt > UCS-2-designed functions for UTF-16, but it's not "rocket > science" and works very well thanks to the > Unicode allocation practice (common characters in the BMP). > Making UTF-8/32 functions work with > supplementary code points when they had assumed BMP-only > operation probably took some work too. > > In fact, on Unix/Linux systems you find not only UTF-32 via > wchar_t, but also UTF-8 (low-level tools > and gnome) and UTF-16 (ICU, KDE/Qt, and many applications like > Mozilla and OpenOffice). > > Best regards, > markus > > -- > Opinions expressed here may not reflect my company's positions > unless otherwise noted. > > >

