> On Jun 10, 2017, at 11:24 AM, Richard Damon <rich...@damon-family.org> wrote: > > If the field was declared as char[38], and UNICODE collation, many systems > will allocate 4 bytes per character to allow for any possible character
Maybe, but it’s a bad design! It’s a huge amount of bloat for most text that would be stored in a database (even most Asian characters will fit in 16 bits.) And worse, it doesn’t actually make working with text easier, because you can _not_ treat every 32-bit code point as a character. There are arcane Unicode rules for combining code points — an accented letter may be represented as the base letter followed by an accent mark, and some ideographs (including many emoji!) are composed of multiple combined ideographs. So even something simple like “how many characters are in this string” requires scanning the string, not just a simple array lookup. In the end, UTF-8 almost always becomes the best encoding, since everything has to be treated as variable-width anyway. —Jens _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users