[EMAIL PROTECTED] wrote:
> I guess this should be a FAQ (but is'nt). I need code to convert unicode
> data between
> various encoding schemes (UTF16LE to UTF32BE etc...). Are there standard
> routines
> I can use ? If so, where can I find them ?
The CD for the Unicode book should have some of this - in any case, these
transformations are fairly simple.
Unicode libraries have it, see http://www.unicode.org/unicode/onlinedat/products.html
For example, see ICU at http://oss.software.ibm.com/icu/ - see documentation and
source code for converters and UTF macros in icu/source/common/unicode/utf.h
> As an aside. I have run into trouble porting a database application which
> stores UTF16LE
> data onto HPUX and SUN machines. I can see that wchar_t there is defined as
> unsigned long.
> So most probably all wcs*() functions would expect UTF32 encoded data. Am I
> correct in my
> assumption ? What do I do to be certain ?
wchar_t is a very fuzzy type. It may be 8, 16, or 32 bits depending on the platform,
and there is no general guarantee that it stores Unicode. Most older systems use it
for scalar character code points custom-built for the char* encoding.
> What online information can I
> look through for
> more information on such a problem ?
About wchar_t and Unicode, see "What size wchar_t do I need for Unicode?" at
http://www-4.ibm.com/software/developer/library/uniwchar.html
To be sure, you can use typedefs that are always what you want. ICU and other
libraries define types for string units and scalar code points that work on all
platforms, and they provide functions to work with such Unicode strings and characters.
Good luck,
markus