Re: [sqlite] Weird chars inserted
Ariel, Keith, Rowan, apologies, I re-ran the queries a few times, and I decided to include "rowid" to keep track of the changes. The experiments were conducted by cut-and-paste of the í character from the email, hence UTF8, and using x'...' for inserts and concats. Note, I use sqlite3 shell by preference, but I use both the Firefox addon and "DB Browser for Sqlite" for GUI convenience - however, for inserts and updates I use the sqlite3 shell or the C programming interface. In C, I use int rather than char - FILE *pinfile = NULL; ... pinfile = fopen(argv[1],"rb"); int ch = fgetc (pinfile); /* changed from char to int to allow >127 & UTF */ Also, I use .mode csv and then a spreadsheet quite a lot. Note, at the end, I added typeof( ) - and most were BLOBs and a couple as TEXT. kevin@kevin-Aspire-V5-571G:~$ sqlite3 dir_md5sum_db.sqlite SQLite version 3.15.2 2016-11-28 19:13:37 Enter ".help" for usage hints. sqlite> SELECT * FROM dir_md5sum ...> where rowid >= 194576; kev| kev2| kev3| kev4| sqlite> insert into dir_md5sum values ( 'kev5', x'C3AD'); sqlite> SELECT * FROM dir_md5sum where rowid >= 194576; kev| kev2| kev3| kev4| kev5|í sqlite> .schema CREATE TABLE dir_md5sum (dir_name text, dir_md5sum text); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 sqlite> insert into dir_md5sum values ( 'kev6', 'a' || x'C3AD' || 'b' ); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 194581|kev6|6B657636|aíb|61C3AD62|97 sqlite> insert into dir_md5sum values ( 'kev7', 'c' || x'00ED' || 'd' ); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 194581|kev6|6B657636|aíb|61C3AD62|97 194582|kev7|6B657637|c|6300ED64|99 sqlite> insert into dir_md5sum values ( 'kev8', x'00ED' ); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 194581|kev6|6B657636|aíb|61C3AD62|97 194582|kev7|6B657637|c|6300ED64|99 194583|kev8|6B657638||00ED| sqlite> .mode csv sqlite> .once /home/kevin/Martin.csv sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum), typeof(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576,kev,6B6576,"�",EE,65533,blob 194577,kev2,6B657632,"�",EE,65533,blob 194578,kev3,6B657633,"�",EE,65533,blob 194579,kev4,6B657634,"�",ED,65533,blob 194580,kev5,6B657635,"í",C3AD,237,blob 194581,kev6,6B657636,"aíb",61C3AD62,97,text 194582,kev7,6B657637,c,6300ED64,99,text 194583,kev8,6B657638,"",00ED,,blob sqlite> regs, Kev Date: Mon, 19 Dec 2016 11:12:59 +0800 From: Rowan Worth To: SQLite mailing list Subject: Re: [sqlite] Weird chars inserted Message-ID: Content-Type: text/plain; charset=UTF-8 On 19 December 2016 at 08:24, Kevin wrote: > Hi Martin, > > I had a go using a terminal session, with default encoding UTF-8. > > Try using the hex( ) and unicode( ) functions to check what is actually > stored in the sqlite table. > > I put a couple of rows at the end of an existing simple table > > kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite > SQLite version 3.15.2 2016-11-28 19:13:37 > Enter ".help" for usage hints. > sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), > unicode(dir_md5sum) FROM dir_md5sum >...> where rowid >= 194576; > 194576|kev|6B6576|í|C3AD|237 > 194577|kev2|6B657632|�|ED|65533 > sqlite> .quit > kevin@kevin-Aspire-V5-571G:~$ > Hi Kevin, The problem here lies in whatever inserted these rows. sqlite just stores what it is given - it is up to the application to take care of encoding issues. In th
Re: [sqlite] Weird chars inserted
On 20 December 2016 at 08:06, Kevin Youren wrote: > The experiments were conducted by cut-and-paste of the í character from > the email, hence UTF8, "Hence UTF8" is presumptuous. There are many encodings the website could be using to communicate í. The browser may (or may not) convert to UTF8 internally, and there's potentially a conversion at the browser/X11 boundary when you copy to clipboard. There's potentially another conversion when you paste from X11's clipboard to your terminal. But generally linux is setup these days to use UTF8 everywhere which bypasses some of this madness - I would expect your copy/paste based experiments worked just fine? > In C, I use int rather than char - > > FILE *pinfile = NULL; > ... > pinfile = fopen(argv[1],"rb"); > > > > int ch = fgetc (pinfile); > /* changed from char to int to allow >127 & UTF */ > It doesn't matter how you define ch, fgetc only ever reads a _single_ byte. This approach is doomed to fail if it ever encounters UTF8, since UTF8 is a multi-byte encoding. If you switch to fgetwc and invoke the correct setlocale magic (see https://www.cl.cam.ac.uk/~mgk25/unicode.html#c), this approach can correctly decode the í from your input file. However this is not what you want either - at that point 'ch' will equal 237, which is the _code point_. If you ask sqlite to insert the code point in a column, it's going to go ahead and do just that. Like I said before, it doesn't encode/decode the data you give it. If you avoid decoding the input data and send it straight through to sqlite, it will probably work. ie. read into a char* buf and use sqlite3_bind_text. -Rowan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Weird chars inserted
On 19 December 2016 at 08:24, Kevin wrote: > Hi Martin, > > I had a go using a terminal session, with default encoding UTF-8. > > Try using the hex( ) and unicode( ) functions to check what is actually > stored in the sqlite table. > > I put a couple of rows at the end of an existing simple table > > kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite > SQLite version 3.15.2 2016-11-28 19:13:37 > Enter ".help" for usage hints. > sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), > unicode(dir_md5sum) FROM dir_md5sum >...> where rowid >= 194576; > 194576|kev|6B6576|í|C3AD|237 > 194577|kev2|6B657632|�|ED|65533 > sqlite> .quit > kevin@kevin-Aspire-V5-571G:~$ > Hi Kevin, The problem here lies in whatever inserted these rows. sqlite just stores what it is given - it is up to the application to take care of encoding issues. In this case the "kev" row has been inserted using utf-8 encoding, so when you retrieve this value sqlite emits the bytes 0xC3 0xAD (exactly as they were stored), which your terminal interprets as utf-8 and renders the character í. The "kev2" row however is not utf-8 encoded. The dir_md5sum column contains a single byte 0xED, which is not valid utf-8 (the encoding specifies that when the highest bit is on, there is more information about the current character in the next byte). When you retrieve this value sqlite emits the byte 0xED (exactly as it was stored). Your terminal tries to interpret this as utf-8, but since it is not valid it instead inserts a unicode replacement character (U+FFFD). Sqlite3's unicode() function makes the same U+FFFD replacement when encountering an invalid encoding which is where the 65533 comes from. -Rowan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Weird chars inserted
Inquiring minds want to know how a selection of 5 values resulted in a result list of 6 values ... > -Original Message- > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] > On Behalf Of Kevin > Sent: Sunday, 18 December, 2016 17:25 > To: Ariel M. Martin > Cc: sqlite mailing list > Subject: [sqlite] Weird chars inserted > > Hi Martin, > > I had a go using a terminal session, with default encoding UTF-8. > > Try using the hex( ) and unicode( ) functions to check what is actually > stored in the sqlite table. > > I put a couple of rows at the end of an existing simple table > > kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite > SQLite version 3.15.2 2016-11-28 19:13:37 > Enter ".help" for usage hints. > sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), > unicode(dir_md5sum) FROM dir_md5sum > ...> where rowid >= 194576; > 194576|kev|6B6576|í|C3AD|237 > 194577|kev2|6B657632|�|ED|65533 > sqlite> .quit > kevin@kevin-Aspire-V5-571G:~$ > > > > I tried changing the terminal session to UTF-16, but I ended up with a > mess of Chinese characters. > > > > > > > the � is a place marker inserted by the program that displays on your > screen for a character the program doesn't recognize. > > Check http://www.fileformat.info/info/unicode/char/00ed/index.htm > > as a reference. > > I use Linux (Xubuntu) > > > regs, > > Kev > > > From: "Ariel M. Martin" > To: " sqlite-users@mailinglists.sqlite.org" > > Subject: [sqlite] Weird chars inserted > Message-ID: <835952.64146...@smtp113.mail.ne1.yahoo.com> > Content-Type: text/plain; charset="utf-8" > > Hi people. I need some help I’m lost here. > I’m writing an application using SQLite and whenever I insert Spanish > characters > I get this kind of strings: > Mart�n (where it should read ‘Martín’) > > Ok, so if I open my database with SQLiteManager (the Firefox addin) my > strings look > weird like that. But SQLiteManager is able to insert and display any > string correctly. > And any string inserted by SQLiteManager displays all right in my app > as well. > > So I must be doing something wrong when I insert the strings. This is > my code: > > Note: szSQL is the wchar-string my app uses > > char szAux[2048]; > ZeroMemory(szAux, 2048); > WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK, szSQL, > wcslen(szSQL), szAux, 2048, NULL, 0); > > int nRet= sqlite3_exec(m_hDB, szAux, NULL, 0, &pErrMsg); > > > SQLiteManager tells me that my db is utf-16le. Im using vs2015 in a > Windows10 64bit machine. > > I’d appreciate any help. > Thanks in advance! > > > Ariel M. Martin > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Weird chars inserted
Igor, On 12/18/2016 7:17 AM, Jean-Christophe Deschamps wrote: Since your DB is UTF-16LE encoded, you shouldn't convert your strings to UTF8. int nRet= sqlite3_exec(m_hDB, szSQL, NULL, 0, &pErrMsg); alone should work fine. No it should not. The encoding of the database is irrelevant here: sqlite3_exec still expects a UTF-8 encoded string. Oops, of course that's it. I'm so used to have this kind of mistake reported about _SQLite3_Exec() using the wrapper that I use and support that I didn't realize the OP was using the bare SQLite API. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Weird chars inserted
On 12/18/2016 7:17 AM, Jean-Christophe Deschamps wrote: Since your DB is UTF-16LE encoded, you shouldn't convert your strings to UTF8. int nRet= sqlite3_exec(m_hDB, szSQL, NULL, 0, &pErrMsg); alone should work fine. No it should not. The encoding of the database is irrelevant here: sqlite3_exec still expects a UTF-8 encoded string. There are SQLite API functions that accept UTF-16 strings (e.g. sqlite3_bind_text16), but sqlite3_exec is not one of them. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Weird chars inserted
On 12/17/2016 8:38 PM, Ariel M. Martin wrote: Note: szSQL is the wchar-string my app uses char szAux[2048]; ZeroMemory(szAux, 2048); WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK, szSQL, wcslen(szSQL), szAux, 2048, NULL, 0); Replace CP_ACP with CP_UTF8. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Weird chars inserted
Ariel, At 02:38 18/12/2016, you wrote: Hi people. I need some help Iâm lost here. Iâm writing an application using SQLite and whenever I insert Spanish characters I get this kind of strings: Mart�n (where it should read âMartÃnâ) Ok, so if I open my database with SQLiteManager (the Firefox addin) my strings look weird like that. But SQLiteManager is able to insert and display any string correctly. And any string inserted by SQLiteManager displays all right in my app as well. So I must be doing something wrong when I insert the strings. This is my code: Note: szSQL is the wchar-string my app uses char szAux[2048]; ZeroMemory(szAux, 2048); WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK, szSQL, wcslen(szSQL), szAux, 2048, NULL, 0); int nRet= sqlite3_exec(m_hDB, szAux, NULL, 0, &pErrMsg); SQLiteManager tells me that my db is utf-16le. Im using vs2015 in a Windows10 64bit machine. Since your DB is UTF-16LE encoded, you shouldn't convert your strings to UTF8. int nRet= sqlite3_exec(m_hDB, szSQL, NULL, 0, &pErrMsg); alone should work fine. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users