> Le 11 f?vr. 2016 ? 10:48, Olivier Mascia <om at integral.be> a ?crit :
> 
> It looks like the appropriate character set mapping behavior on Windows is 
> still not quite right in the command line utility.  I'm currently reviewing 
> the code and it looks like it revolves around the 
> sqlite3_win32_mbcs_to_utf8() API (and reciprocal) which in turns calls 
> winMbcsToUnicode() and reciprocal.  Those do this:
> 
>  int codepage = osAreFileApisANSI() ? CP_ACP : CP_OEMCP;
> 
> and if this is quite right for a filename, it is not for the content read-in 
> or written-out from/to the console, which should use CP_OEMCP.
> 
> By default on Windows the ...A version of file APIs are using 'ANSI' (for the 
> encoding of the filenames they expect or return - nothing to with the 
> encoding of whatever is read or written).  They can be changed to using 'OEM' 
> encoding, which might be simpler for some console programming but is 
> generally not the thing to do.  The console I/O itself always requires by 
> default CP_OEMCP which maps to whatever specific 'code page' the console is 
> actually using.
> 
> In other words, all things default, a filename passed to CreateFileA should 
> be ANSI (unless osAreFileApisANSI() says the no), but the data read or 
> written from stdin/stdout/stderr when interactive should use CP_OEMCP.
> 
> I'll try to patch a proper solution over these next days as time permit and I 
> will propose it. It might find its way in some next 3.11.x release.  I need 
> to get more acquainted with the existing code and its intent, first.
> 
> I think the shell.c program might have its own oem_to_utf8() and 
> utf8_to_oem(), which are not needed in the engine sqlite3.c, for translating 
> interactive console I/O, and leave sqlite32_win32_mbcs_to_utf8() as is in the 
> engine, for whatever it is currently used for (if at all).

In other words...

If I'm changing shell.c (utf8_printf) to use some sqlite3_win32_utf8_to_oem() 
(which is the same as sqlite3_win32_utf8_to_mbcs, but always use CP_OEMCP) AND 
I'm changing shell.c (local_getline) to use some sqlite3_win32_oem_to_utf8() 
(which is the same as sqlite3_win32_mbcs_to_utf8, but always use CP_OEMCP), 
then my command-line tool sqlite3.exe works nicely with accented characters.

sqlite3.exe test.db
sqlite> create table ?cole(? text);
sqlite> insert into ?cole(?) values('?cole');
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE ?cole(? text);
INSERT INTO "?cole" VALUES('?cole');
COMMIT;
sqlite> .quit

sqlite3.exe test.db .dump >test.sql
type test.sql
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE ??cole(?? text);
INSERT INTO "??cole" VALUES('??cole');
COMMIT;

One can clearly recognize (I know them for '?' value) the right double byte for 
the UTF8 representation of '?'.
The data typed interactively at the console has been properly converted to 
UTF8, and stored as such. It is properly output intact (utf8) when not directed 
to the console (file test.sql) and properly converted back to OEMCP when 
running .dump interactively.

One problem remains : shell.c does *nothing* to process the command-line 
correctly. It merely uses it, and assumes argv[] are utf8 encoded which is 
wrong. They're ANSI MBCS encoded, despite having been typed on a console line 
which itself use OEMCP for display and input.

So this works OK:
sqlite3.exe
sqlite> .open ?cole.db
sqlite> .quit
The created filename is properly '?cole.db'.

But this won't work:
sqlite3.exe ?cole.db
sqlite> create table t(c);
sqlite> .quit

dir
11-02-16  13:06             8.192 ?cole.db

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 
<http://mailinglists.sqlite.org/cgi-bin/mailman/private/sqlite-users/attachments/20160211/360fea94/attachment.pgp>

Reply via email to