RE: What's in a wchar_t string on unix?

Rick Cameron Thu, 04 Mar 2004 10:33:02 -0800

Woo-hoo! Finally, a real answer, rather than speculation.

Thanks very much, Ienup.

- rick 

-----Original Message-----
From: Ienup Sung [mailto:[EMAIL PROTECTED] 
Sent: March 4, 2004 9:53
To: Rick Cameron
Cc: [EMAIL PROTECTED]
Subject: Re: What's in a wchar_t string on unix?

Solaris Unicode/UTF-8 locales are using UTF-32 and we guarantee that it has
been and will stay that way.

Just in case, there are also a set of C std API such as mbtowc(),
mbstowcs(), mbrtowc(), wctomb(), wcstombs(), wcrtomb(), and so on that will
convert between wide character (UTF-32) and multibyte character (UTF-8)
properly as long as you set the current locale to a Unicode/UTF-8 locale. If
you wish to use non-locale sensitive function of conversion, you could use
iconv() instead by openning the conversion descriptor with iconv_open() with
"UTF-32" and "UTF-8" as fromcode and tocode (or vice versa). (A sample
program example is available at iconv(3C) man page at Solaris by the way.)

I'm also quite sure all major Unix/Linux systems support the functions that
I mentioned. (I also believe majority will support UTF-32BE, UTF-32LE and
such variations too in the iconv() code conversions by the way.)

Additionally, since POSIX defines wchar_t as an opaque data type, we hope
that people are using the std C interfaces to do conversions between wchar_t
and multibyte characters if possible.

With regards,

Ienup

] From: Rick Cameron <[EMAIL PROTECTED]>
] Subject: RE: What's in a wchar_t string on unix?
] Date: Mon, 1 Mar 2004 13:59:06 -0800
]
] OK, I guess I need to be more precise in my question.
]
] For each of the popular unices (Solaris, HP-UX, AIX, and - if possible - ]
linux), can anyone answer the following question:
]
] Assuming that the locale is set to Unicode, what is in a wchar_t string?
Is ] it UTF-32 or pseudo-UTF-16 (i.e. UTF-16 code units, zero-extended to 32
] bits)?
]
] I'm not expecting that there's single answer for all the unices of
interest.
] And I'm well aware that our application can store in a wchar_t [] whatever
] it wants. I'm trying to find out what the O/S expects to be in a wchar_t ]
string.
]
] The reason we want to know this is that we want to be able to write a ]
function that converts from UTF-8 (stored in a char []) to wchar_t [] ]
properly. Obviously the function may need to behave differently on different
] flavours of unix.
]
] I am aware of the utility functions offered by TUC to perform conversions
] between UTF-8, UTF-16 and UTF-32. These functions do not handle the case
of ] pseudo-UTF-16; which doesn't surprise me, since AFAIK it's not a
conformant ] encoding form. Nonetheless, I have a string suspicion that some
unices may ] use it.
]
] Cheers
]
] - rick cameron

RE: What's in a wchar_t string on unix?

Reply via email to