On 20 December 2016 at 08:06, Kevin Youren <kevin.you...@gmail.com> wrote:
> The experiments were conducted by cut-and-paste of the í character from > the email, hence UTF8, "Hence UTF8" is presumptuous. There are many encodings the website could be using to communicate í. The browser may (or may not) convert to UTF8 internally, and there's potentially a conversion at the browser/X11 boundary when you copy to clipboard. There's potentially another conversion when you paste from X11's clipboard to your terminal. But generally linux is setup these days to use UTF8 everywhere which bypasses some of this madness - I would expect your copy/paste based experiments worked just fine? > In C, I use int rather than char - > > FILE *pinfile = NULL; > ... > pinfile = fopen(argv[1],"rb"); > > .... > > int ch = fgetc (pinfile); > /* changed from char to int to allow >127 & UTF */ > It doesn't matter how you define ch, fgetc only ever reads a _single_ byte. This approach is doomed to fail if it ever encounters UTF8, since UTF8 is a multi-byte encoding. If you switch to fgetwc and invoke the correct setlocale magic (see https://www.cl.cam.ac.uk/~mgk25/unicode.html#c), this approach can correctly decode the í from your input file. However this is not what you want either - at that point 'ch' will equal 237, which is the _code point_. If you ask sqlite to insert the code point in a column, it's going to go ahead and do just that. Like I said before, it doesn't encode/decode the data you give it. If you avoid decoding the input data and send it straight through to sqlite, it will probably work. ie. read into a char* buf and use sqlite3_bind_text. -Rowan _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users