----- Original Message -----
From: "D. Richard Hipp" <[EMAIL PROTECTED]>
To: <sqlite-users@sqlite.org>
Sent: Tuesday, September 06, 2005 8:53 AM
Subject: Re: [sqlite] Please test on Win95/98/ME
On Tue, 2005-09-06 at 08:35 -0700, Roger Binns wrote:
> To sum up: You need to convert UTF-8 to UTF-16-LE first. Then,
> if the OS is NT, you can pass these to the ...W functions.
> Otherwise, you need to further convert to ANSI user codepage
> and pass it to the ...A functions.
Alternatively tell people to link against unicows if they need
win9x support and you can stick to only using the W functions.
http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx
I like this solution a lot. This is probably what I will
end up doing unless somebody can suggest a good reason not
to.
I've done some experimenting using the unicode/ansi API's and testing on my
machines here at work ... here's the conclusion(s)
1. The Unicows DLL fakes CreateFileW() by calling WideCharToMultiByte() and
passing the ANSI string to CreateFileA().
2. WideCharToMultiByte() only properly maps characters in the current
computer's codepage, which means for example, on my U.S. machine if I have a
filename with chinese characters, the call to WideCharToMultiByte() will
return ?????'s in place of the chinese characters.
3. On NTFS since its UNICODE by default, a chinese file will show up
without the ????'s on my filesystem, but I would NOT be able to open it
using the ANSI version of CreateFile(). I tried several ways of doing this
using the ANSI functions and none of them worked. The filename always ends
up being passed with ????'s in it and CreateFileA() kicks it out as an
invalid filename.
4. The only way to open a non-U.S. filename on my U.S. computer is with the
unicode version of CreateFile.
All that being said ...
As long as you are creating files using your current computer's codepage,
and using characters in that codepage, WideCharToMultiByte() works and you
can open the file with the ANSI functions. For maximum effectiveness, I
recommend you keep doing what you're doing now by checking isNT() and
branching off to the UNICODE version on NT and the ANSI version on Win9x.
However, the current patch as it stands will still not work correctly on 9x
platforms because you're still passing the UTF8 string to the ANSI versions
of the functions which are expecting MBCS. While this works (sortof) if you
are calling sqlite3_open(), it will not work if one calls sqlite3_open16().
To fix it, make utf8ToUnicode() always return a unicode string (remove the
IsNT() check) and add a utf8ToMBCS() function:
const char *utf8ToMBCS(const char *zFilename) {
wchar_t *zFilename16 = utf8ToUnicode(zFilename);
if (zFilename16) {
unsigned int cp = AreFileApisANSI() ? CP_ACP : CP_OEMCP;
int n = WideCharToMultiByte(cp, 0, zFilename16, -1, NULL, 0, NULL,
NULL);
if (n) {
char *zFilenameMBCS = sqlite3Malloc(n);
if (zFilenameMBCS) {
n = WideCharToMultiByte(cp, 0, zFilename16, -1, zFilenameMBCS, n,
NULL, NULL);
if (n)
return zFilenameMBCS;
}
}
}
return NULL;
}
... Then in all the File I/O functions check isNT() then branch and call
either utf8ToUnicode() or utf8ToMBCS(). At least this way the non-ASCII
characters (if they exist in your current codepage) will still translate
properly on Windows 9x.
Robert