Actually a maximum of 4 bytes are required to encode a single valid code-point in UTF-8.
> On Aug 8, 2017, at 2:44 AM, Jens Alfke <j...@mooseyard.com> wrote: > > >> On Aug 7, 2017, at 8:29 AM, x <tam118...@hotmail.com> wrote: >> >> I thought I had learned enough about this string lunacy to get by but >> finding out that the UTF8 code for the UTF16 code \u0085 is in fact \uc285 >> has tipped me over the edge. I assumed they both used the same codes but >> UTF16 allowed some characters UTF8 didn’t have. > > UTF-8 is backwards-compatible with ASCII. All 7-bit bytes (00-7f) represent > the same characters as their ASCII equivalents. Beyond that, UTF-8 uses a > sequence of two to five bytes in the range 80-ff to encode a single Unicode > character/code-point. (You can sort of think of this as every byte holding 7 > bits of the actual character number, with its MSB set to 1. It’s not exactly > like that, but close.) > > IMHO UTF-8 is the best general purpose text encoding. Code that works with > ASCII (real 7-bit ASCII, not the nonstandard “extended” stuff) will generally > work with UTF-8; the main thing to watch out for tends to be breaking or > trimming strings, because you don’t want to cut part of a multibyte sequence. > UTF-8 is also quite compact for Roman languages (although not non-Roman ones.) > > 16-bit encodings used to seem like a good idea back when Unicode has fewer > than 65,536 characters, so you could assume that one unichar = one character. > Those days are long gone. Now dealing with UTF-16 has all the same problems > of dealing with UTF-8 (i.e. multi-word sequences) without the benefits of > compactness or ASCII compatibility. > > 32-bit encodings are just silly, unless for some reason you really really > have to optimize for speed over size (and even then the added size may well > blow out your CPU caches and negate the speed boost.) > > —Jens > > PS: Apparently C++11 allows Unicode string literals by putting a letter U in > front of the initial quote. The result will be a string of wchar_t. > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users