> -- no support for other encodings (say for Asian languages
> which may need more bytes than Unicode supports)
Speaking of Asian languages, reminds me.
wchar_t in Unix is 32-bit long.
I have never really understod how
all these different formats work.
Anybody care to enlightment me?
OK, I can search the net but I guess
other might be intrested.
Here is what I know or think I know.
ASCII: is pretty obvious 7-bit long
Latin-1 (ISO 8859-1) also pretty straight forward 8-bit long
extra characters >= 128. Lower 7-bits with ASCII.
UTF16: Also pretty straight forward 16-bit long.
Lower 8-bit Latin-1.
UTF8: Lower 7-bit ASCII. If >= 128 next byte belong to
same character. Compatible with ASCII, but
it isn't compatible with Latin-1 is it?
Regardless the "ASCII" (A) function in Windows can
take other characters encoding than Latin-1 (Latin-[1-?])
so UTF8 and "ASCII" are not really exchangable without
conversion is it?
Then we have the Asian multi byte character formats
that I know nothing about.
Then we have the strange fact that wchar_t is 32-bit,
which I never really understood since most Unicode
support in Unix is UTF8 IIRC.
Perhaps it will be useful with UTF32 which also
exists IIRC eventhough I know nothing about it.
> In any case, we need some guidelines on how to deal with Unicode
> in a consistent manner.
Agreed.
> The current situation is a mess of 1 & 2
> which will need cleanup.
Yes, as I posted earlier there are 172 functions
that do W->A and 55 that do A->W.