Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

Jens Alfke Mon, 19 Feb 2018 11:29:57 -0800


> On Feb 19, 2018, at 2:54 AM, Ralf Junker <ralfjun...@gmx.de> wrote:
> 
> 'です' are 2 codepoints according to
> 
>  http://www.fontspace.com/unicode/analyzer/?q=%E3%81%A7%E3%81%99 
> <http://www.fontspace.com/unicode/analyzer/?q=%E3%81%A7%E3%81%99>
> 
> The requested overall width is 4, so I would expect expect two added spaces 
> and a total length of 4.


If this is being done for the purpose of visual alignment in a monospaced font, 
it's not going to work. Both of those Kanji(?) characters are displayed as 
double-width (in macOS's Terminal at least), so their visual width is 4 spaces, 
meaning there should be zero spaces of padding.

You really _cannot_ equate Unicode code-points with visual width of displayed 
text, even in a monospaced layout. Not only do terminals render some characters 
as double-width, but there are all kinds of other exceptions like zero-width 
joiners, diacritical marks, ligatures, and joined forms. As a very common 
example of the latter, many emojis — e.g. all the faces with multiple skin 
tones — are actually composed of multiple (up to five or six) Unicode 
code-points.

TL;DR: If you use character (code-point) counts to visually lay out text, 
you're likely to get bad results with anything other than plain ASCII, so it's 
only marginally better than just counting bytes.

—Jens
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

Reply via email to