Igor Tandetnik <itandet...@...> writes:

> 
> Niklas Bäckman <nikba...@...> wrote:
> > Columns with special characters like ("å" "ä" "å") get too short widths when
> > output.
> > 
> > I guess this is due to the shell not counting actual UTF8 *characters/code
> > points* when calculating the widths, but instead only
> > counting the plain bytes in the strings, so they will seem longer until they
> > are actually printed to the console.
> 
> Note that counting codepoints, while it happens to help with your particular
data, won't help in general.
> Consider combining diacritics: U+00E4 (small A with diaeresis) looks the same
as U+0061 U+0308 (small
> letter A + combining diaeresis) when printed on the console.

You are right of course. The shell should not count code points, but graphemes.

http://unicode.org/faq/char_combmark.html#7

I guess that this probably falls out of the "lite" scope of SQLITE though? Like
how it does not support case-insensitive comparison of non-ASCII characters.

Or would it be possible to write such a graphemelen(s) function in not too many
lines of C code without needing any external Unicode libraries? 


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to