Igor Tandetnik <itandet...@...> writes: > > Niklas Bäckman <nikba...@...> wrote: > > Columns with special characters like ("å" "ä" "å") get too short widths when > > output. > > > > I guess this is due to the shell not counting actual UTF8 *characters/code > > points* when calculating the widths, but instead only > > counting the plain bytes in the strings, so they will seem longer until they > > are actually printed to the console. > > Note that counting codepoints, while it happens to help with your particular data, won't help in general. > Consider combining diacritics: U+00E4 (small A with diaeresis) looks the same as U+0061 U+0308 (small > letter A + combining diaeresis) when printed on the console.
You are right of course. The shell should not count code points, but graphemes. http://unicode.org/faq/char_combmark.html#7 I guess that this probably falls out of the "lite" scope of SQLITE though? Like how it does not support case-insensitive comparison of non-ASCII characters. Or would it be possible to write such a graphemelen(s) function in not too many lines of C code without needing any external Unicode libraries? _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users