Windows UTF-16 is represented by WCHAR.  It is always 2 bytes.  UCS-2 can
be 3 or more bytes but these are for extended characters outside the ones
used for real language.  For example, musical notation symbols use the
third byte.  I don't think any OS's use UCS2 directly.  I know Oracle
supports UTF8, UTF16, and UCS2.  In fact, Oracle's online documentation
has a really good discussion of Unicode.  Look for their
internationalization book.  I wrote some code that was sharing data from
Oracle to Microsoft SQL Server and found this book very helpful.  Oracle
generally favors UTF8 while SQL Server favors UTF16.

If you are going to cast to unsigned char*, you must manage the fact that
your strings are two (or more) bytes.  You are just effectively using a
byte pointer to the string data.  I think wchar_t* is used typically
but the encoding is usually platform dependent.  The big problem you have
is that your databases are portable.  I think you will need to pick and
internal format to store the strings in the db so that you can then
translate as appropriate for a platform.  You may be able to do some clever things
like use sizeof(wchar_t) to find out how many bytes are used for a
character and use that for your translation.

There is a Unicode book available that talks about the specs.
Unfortunately, my experience has been that everyone has their own
nuiances.  Generally though it is pretty consistent.


On Wed, 7 Apr 2004, D. Richard Hipp wrote:

> Simon Berthiaume wrote:
> >  >> Notice that text strings are always transferred as type "char*" even
> > if the text representation is UTF-16.
> >
> > This might force users to explicitely type cast some calls to function
> > to avoir warnings. I would prefer UNICODE neutral functions that can
> > take either one of them depending on the setting of a compilation
> > #define (UNICODE). Create a function that takes char * and another that
> > takes wchar_t * them encourage the use of a #defined symbol that would
> > switch depending on context (see example below). It would allow people
> > to call the functions in either way they want.
> >
> >     Example:
> >
> >         int sqlite3_open8(const char*, sqlite3**, const char**);
> >         int sqlite3_open16(const wchar_t*, sqlite3**, const wchar_t**);
> >         #ifdef UNICODE
> >             #define sqlite3_open sqlite3_open16
> >         #else
> >             #define sqlite3_open sqlite3_open8
> >         #endif
> >
> I'm told that wchar_t is 2 bytes on some systems and 4 bytes on others.
> Is it really acceptable to use wchar_t* as a UTF-16 string pointer?
> Note that internally, sqlite3 will cast all UTF-16 strings to be of
> type "unsigned char*".  So the type in the declaration doesn't really
> matter. But it would be nice to avoid compiler warnings.  So what datatype
> are most systems expecting to use for UTF-16 strings?  Who can provide
> me with a list?  Or even a few examples?

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to