On 19 December 2016 at 08:24, Kevin <[email protected]> wrote: > Hi Martin, > > I had a go using a terminal session, with default encoding UTF-8. > > Try using the hex( ) and unicode( ) functions to check what is actually > stored in the sqlite table. > > I put a couple of rows at the end of an existing simple table.... > > kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite > SQLite version 3.15.2 2016-11-28 19:13:37 > Enter ".help" for usage hints. > sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), > unicode(dir_md5sum) FROM dir_md5sum > ...> where rowid >= 194576; > 194576|kev|6B6576|í|C3AD|237 > 194577|kev2|6B657632|�|ED|65533 > sqlite> .quit > kevin@kevin-Aspire-V5-571G:~$ >
Hi Kevin, The problem here lies in whatever inserted these rows. sqlite just stores what it is given - it is up to the application to take care of encoding issues. In this case the "kev" row has been inserted using utf-8 encoding, so when you retrieve this value sqlite emits the bytes 0xC3 0xAD (exactly as they were stored), which your terminal interprets as utf-8 and renders the character í. The "kev2" row however is not utf-8 encoded. The dir_md5sum column contains a single byte 0xED, which is not valid utf-8 (the encoding specifies that when the highest bit is on, there is more information about the current character in the next byte). When you retrieve this value sqlite emits the byte 0xED (exactly as it was stored). Your terminal tries to interpret this as utf-8, but since it is not valid it instead inserts a unicode replacement character (U+FFFD). Sqlite3's unicode() function makes the same U+FFFD replacement when encountering an invalid encoding which is where the 65533 comes from. -Rowan _______________________________________________ sqlite-users mailing list [email protected] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

