On 19 December 2016 at 08:24, Kevin <[email protected]> wrote:

> Hi Martin,
>
> I had a go using a terminal session, with default encoding UTF-8.
>
> Try using the hex( ) and unicode( ) functions to check what is actually
> stored in the sqlite table.
>
> I put a couple of rows at the end of an existing simple table....
>
> kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite
> SQLite version 3.15.2 2016-11-28 19:13:37
> Enter ".help" for usage hints.
> sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum),
> unicode(dir_md5sum)  FROM dir_md5sum
>    ...> where rowid >= 194576;
> 194576|kev|6B6576|í|C3AD|237
> 194577|kev2|6B657632|�|ED|65533
> sqlite> .quit
> kevin@kevin-Aspire-V5-571G:~$
>

Hi Kevin,

The problem here lies in whatever inserted these rows. sqlite just stores
what it is given - it is up to the application to take care of encoding
issues.

In this case the "kev" row has been inserted using utf-8 encoding, so when
you retrieve this value sqlite emits the bytes 0xC3 0xAD (exactly as they
were stored), which your terminal interprets as utf-8 and renders the
character í.

The "kev2" row however is not utf-8 encoded. The dir_md5sum column contains
a single byte 0xED, which is not valid utf-8 (the encoding specifies that
when the highest bit is on, there is more information about the current
character in the next byte).

When you retrieve this value sqlite emits the byte 0xED (exactly as it was
stored). Your terminal tries to interpret this as utf-8, but since it is
not valid it instead inserts a unicode replacement character (U+FFFD).
Sqlite3's unicode() function makes the same U+FFFD replacement when
encountering an invalid encoding which is where the 65533 comes from.

-Rowan
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to