>   -- no support for other encodings (say for Asian languages
> which may need more bytes than Unicode supports)

Speaking of Asian languages, reminds me.
wchar_t in Unix is 32-bit long.

I have never really understod how
all these different formats work.

Anybody care to enlightment me?
OK, I can search the net but I guess
other might be intrested.

Here is what I know or think I know.

ASCII: is pretty obvious 7-bit long
Latin-1 (ISO 8859-1) also pretty straight forward 8-bit long
        extra characters >= 128. Lower 7-bits with ASCII.
UTF16: Also pretty straight forward 16-bit long.
        Lower 8-bit Latin-1. 

UTF8: Lower 7-bit ASCII. If >= 128 next byte belong to
      same character. Compatible with ASCII, but
      it isn't compatible with Latin-1 is it?

Regardless the "ASCII" (A) function in Windows can
take other characters encoding than Latin-1 (Latin-[1-?])
so UTF8 and "ASCII" are not really exchangable without
conversion is it?

Then we have the Asian multi byte character formats
that I know nothing about.

Then we have the strange fact that wchar_t is 32-bit,
which I never really understood since most Unicode
support in Unix is UTF8 IIRC.

Perhaps it will be useful with UTF32 which also
exists IIRC eventhough I know nothing about it.

> In any case, we need some guidelines on how to deal with Unicode
> in a consistent manner. 

Agreed.

> The current situation is a mess of 1 & 2
> which will need cleanup. 

Yes, as I posted earlier there are 172 functions
that do W->A and 55 that do A->W.

Reply via email to