> Le 11 f?vr. 2016 ? 10:48, Olivier Mascia <om at integral.be> a ?crit : > > It looks like the appropriate character set mapping behavior on Windows is > still not quite right in the command line utility. I'm currently reviewing > the code and it looks like it revolves around the > sqlite3_win32_mbcs_to_utf8() API (and reciprocal) which in turns calls > winMbcsToUnicode() and reciprocal. Those do this: > > int codepage = osAreFileApisANSI() ? CP_ACP : CP_OEMCP; > > and if this is quite right for a filename, it is not for the content read-in > or written-out from/to the console, which should use CP_OEMCP. > > By default on Windows the ...A version of file APIs are using 'ANSI' (for the > encoding of the filenames they expect or return - nothing to with the > encoding of whatever is read or written). They can be changed to using 'OEM' > encoding, which might be simpler for some console programming but is > generally not the thing to do. The console I/O itself always requires by > default CP_OEMCP which maps to whatever specific 'code page' the console is > actually using. > > In other words, all things default, a filename passed to CreateFileA should > be ANSI (unless osAreFileApisANSI() says the no), but the data read or > written from stdin/stdout/stderr when interactive should use CP_OEMCP. > > I'll try to patch a proper solution over these next days as time permit and I > will propose it. It might find its way in some next 3.11.x release. I need > to get more acquainted with the existing code and its intent, first. > > I think the shell.c program might have its own oem_to_utf8() and > utf8_to_oem(), which are not needed in the engine sqlite3.c, for translating > interactive console I/O, and leave sqlite32_win32_mbcs_to_utf8() as is in the > engine, for whatever it is currently used for (if at all).
In other words... If I'm changing shell.c (utf8_printf) to use some sqlite3_win32_utf8_to_oem() (which is the same as sqlite3_win32_utf8_to_mbcs, but always use CP_OEMCP) AND I'm changing shell.c (local_getline) to use some sqlite3_win32_oem_to_utf8() (which is the same as sqlite3_win32_mbcs_to_utf8, but always use CP_OEMCP), then my command-line tool sqlite3.exe works nicely with accented characters. sqlite3.exe test.db sqlite> create table ?cole(? text); sqlite> insert into ?cole(?) values('?cole'); sqlite> .dump PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE ?cole(? text); INSERT INTO "?cole" VALUES('?cole'); COMMIT; sqlite> .quit sqlite3.exe test.db .dump >test.sql type test.sql PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE ??cole(?? text); INSERT INTO "??cole" VALUES('??cole'); COMMIT; One can clearly recognize (I know them for '?' value) the right double byte for the UTF8 representation of '?'. The data typed interactively at the console has been properly converted to UTF8, and stored as such. It is properly output intact (utf8) when not directed to the console (file test.sql) and properly converted back to OEMCP when running .dump interactively. One problem remains : shell.c does *nothing* to process the command-line correctly. It merely uses it, and assumes argv[] are utf8 encoded which is wrong. They're ANSI MBCS encoded, despite having been typed on a console line which itself use OEMCP for display and input. So this works OK: sqlite3.exe sqlite> .open ?cole.db sqlite> .quit The created filename is properly '?cole.db'. But this won't work: sqlite3.exe ?cole.db sqlite> create table t(c); sqlite> .quit dir 11-02-16 13:06 8.192 ?cole.db -- Meilleures salutations, Met vriendelijke groeten, Best Regards, Olivier Mascia, integral.be/om -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://mailinglists.sqlite.org/cgi-bin/mailman/private/sqlite-users/attachments/20160211/360fea94/attachment.pgp>