On 20 December 2016 at 08:06, Kevin Youren <kevin.you...@gmail.com> wrote:

> The experiments were conducted by cut-and-paste of the í character from
> the email, hence UTF8,


"Hence UTF8" is presumptuous. There are many encodings the website could be
using to communicate í. The browser may (or may not) convert to UTF8
internally, and there's potentially a conversion at the browser/X11
boundary when you copy to clipboard. There's potentially another conversion
when you paste from X11's clipboard to your terminal.

But generally linux is setup these days to use UTF8 everywhere which
bypasses some of this madness - I would expect your copy/paste based
experiments worked just fine?


> In C, I use int rather than char -
>
> FILE *pinfile = NULL;
> ...
> pinfile = fopen(argv[1],"rb");
>
> ....
>
> int ch = fgetc (pinfile);
> /* changed from char to int to allow >127 & UTF */
>

It doesn't matter how you define ch, fgetc only ever reads a _single_ byte.
This approach is doomed to fail if it ever encounters UTF8, since UTF8 is a
multi-byte encoding.

If you switch to fgetwc and invoke the correct setlocale magic (see
https://www.cl.cam.ac.uk/~mgk25/unicode.html#c), this approach can
correctly decode the í from your input file. However this is not what you
want either - at that point 'ch' will equal 237, which is the _code point_.

If you ask sqlite to insert the code point in a column, it's going to go
ahead and do just that. Like I said before, it doesn't encode/decode the
data you give it.

If you avoid decoding the input data and send it straight through to
sqlite, it will probably work. ie. read into a char* buf and use
sqlite3_bind_text.

-Rowan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to