----- Original Message ----- From: "D. Richard Hipp" <[EMAIL PROTECTED]>
To: <sqlite-users@sqlite.org>
Sent: Tuesday, September 06, 2005 8:53 AM
Subject: Re: [sqlite] Please test on Win95/98/ME


On Tue, 2005-09-06 at 08:35 -0700, Roger Binns wrote:
> To sum up: You need to convert UTF-8 to UTF-16-LE first. Then,
> if the OS is NT, you can pass these to the ...W functions.
> Otherwise, you need to further convert to ANSI user codepage
> and pass it to the ...A functions.

Alternatively tell people to link against unicows if they need
win9x support and you can stick to only using the W functions.

http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx


I like this solution a lot.  This is probably what I will
end up doing unless somebody can suggest a good reason not
to.


I've done some experimenting using the unicode/ansi API's and testing on my machines here at work ... here's the conclusion(s)

1. The Unicows DLL fakes CreateFileW() by calling WideCharToMultiByte() and passing the ANSI string to CreateFileA().

2. WideCharToMultiByte() only properly maps characters in the current computer's codepage, which means for example, on my U.S. machine if I have a filename with chinese characters, the call to WideCharToMultiByte() will return ?????'s in place of the chinese characters.

3. On NTFS since its UNICODE by default, a chinese file will show up without the ????'s on my filesystem, but I would NOT be able to open it using the ANSI version of CreateFile(). I tried several ways of doing this using the ANSI functions and none of them worked. The filename always ends up being passed with ????'s in it and CreateFileA() kicks it out as an invalid filename.

4. The only way to open a non-U.S. filename on my U.S. computer is with the unicode version of CreateFile.

All that being said ...

As long as you are creating files using your current computer's codepage, and using characters in that codepage, WideCharToMultiByte() works and you can open the file with the ANSI functions. For maximum effectiveness, I recommend you keep doing what you're doing now by checking isNT() and branching off to the UNICODE version on NT and the ANSI version on Win9x.

However, the current patch as it stands will still not work correctly on 9x platforms because you're still passing the UTF8 string to the ANSI versions of the functions which are expecting MBCS. While this works (sortof) if you are calling sqlite3_open(), it will not work if one calls sqlite3_open16().

To fix it, make utf8ToUnicode() always return a unicode string (remove the IsNT() check) and add a utf8ToMBCS() function:


const char *utf8ToMBCS(const char *zFilename) {
 wchar_t *zFilename16 = utf8ToUnicode(zFilename);
 if (zFilename16) {
   unsigned int cp = AreFileApisANSI() ? CP_ACP : CP_OEMCP;
int n = WideCharToMultiByte(cp, 0, zFilename16, -1, NULL, 0, NULL, NULL);
   if (n) {
     char *zFilenameMBCS = sqlite3Malloc(n);
     if (zFilenameMBCS) {
n = WideCharToMultiByte(cp, 0, zFilename16, -1, zFilenameMBCS, n, NULL, NULL);
       if (n)
         return zFilenameMBCS;
     }
   }
 }
 return NULL;
}

... Then in all the File I/O functions check isNT() then branch and call either utf8ToUnicode() or utf8ToMBCS(). At least this way the non-ASCII characters (if they exist in your current codepage) will still translate properly on Windows 9x.

Robert


Reply via email to