Woo-hoo! Finally, a real answer, rather than speculation. Thanks very much, Ienup.
- rick -----Original Message----- From: Ienup Sung [mailto:[EMAIL PROTECTED] Sent: March 4, 2004 9:53 To: Rick Cameron Cc: [EMAIL PROTECTED] Subject: Re: What's in a wchar_t string on unix? Solaris Unicode/UTF-8 locales are using UTF-32 and we guarantee that it has been and will stay that way. Just in case, there are also a set of C std API such as mbtowc(), mbstowcs(), mbrtowc(), wctomb(), wcstombs(), wcrtomb(), and so on that will convert between wide character (UTF-32) and multibyte character (UTF-8) properly as long as you set the current locale to a Unicode/UTF-8 locale. If you wish to use non-locale sensitive function of conversion, you could use iconv() instead by openning the conversion descriptor with iconv_open() with "UTF-32" and "UTF-8" as fromcode and tocode (or vice versa). (A sample program example is available at iconv(3C) man page at Solaris by the way.) I'm also quite sure all major Unix/Linux systems support the functions that I mentioned. (I also believe majority will support UTF-32BE, UTF-32LE and such variations too in the iconv() code conversions by the way.) Additionally, since POSIX defines wchar_t as an opaque data type, we hope that people are using the std C interfaces to do conversions between wchar_t and multibyte characters if possible. With regards, Ienup ] From: Rick Cameron <[EMAIL PROTECTED]> ] Subject: RE: What's in a wchar_t string on unix? ] Date: Mon, 1 Mar 2004 13:59:06 -0800 ] ] OK, I guess I need to be more precise in my question. ] ] For each of the popular unices (Solaris, HP-UX, AIX, and - if possible - ] linux), can anyone answer the following question: ] ] Assuming that the locale is set to Unicode, what is in a wchar_t string? Is ] it UTF-32 or pseudo-UTF-16 (i.e. UTF-16 code units, zero-extended to 32 ] bits)? ] ] I'm not expecting that there's single answer for all the unices of interest. ] And I'm well aware that our application can store in a wchar_t [] whatever ] it wants. I'm trying to find out what the O/S expects to be in a wchar_t ] string. ] ] The reason we want to know this is that we want to be able to write a ] function that converts from UTF-8 (stored in a char []) to wchar_t [] ] properly. Obviously the function may need to behave differently on different ] flavours of unix. ] ] I am aware of the utility functions offered by TUC to perform conversions ] between UTF-8, UTF-16 and UTF-32. These functions do not handle the case of ] pseudo-UTF-16; which doesn't surprise me, since AFAIK it's not a conformant ] encoding form. Nonetheless, I have a string suspicion that some unices may ] use it. ] ] Cheers ] ] - rick cameron

