Re: [sqlite] [Bug] Non-ASCII character is not counted in calculating column width

Yuriy M. Kaminskiy Fri, 02 Jun 2017 03:43:13 -0700

yum...@gmail.com (Yuriy M. Kaminskiy)
writes:

> Jacob Pratt <jhpratt...@gmail.com> writes:
>
>> Using .width x along with .mode columns, any non-ASCII character isn't
>> counted, causing the column to shrink by one.
>>
>> I *think* my analysis is correct, but it also might be counted multiple
>> times by taking a naïve approach and just counting the number of bytes
>> (UTF-8 has multi-byte characters).
>
> Yep, it uses bytes:
[...]
>
> And fprintf counts bytes, not characters.


Oops. I looked at outdated code. As of sqlite3 3.19.2, it *accounts* for 
multi-byte
characters when printing (and does not produce mojibake).

(I have not found anything about that in release notes?)

However, not everywhere; this code auto-detects column width:
          if( w==0 ){
            w = strlen30(azCol[i] ? azCol[i] : "");
            if( w<10 ) w = 10;
            n = strlen30(azArg && azArg[i] ? azArg[i] : p->nullValue);
            if( w<n ) w = n;
          }
... and it counts *bytes*, not *characters*.

(And sqlite3 still *does not* account for double-width characters).

> I'd like to also note that aside of multi-byte characters that must be
> accounted for, there are "opposite": double-width characters. E.g.
> <a5>    /x30/x42        HIRAGANA LETTER A
> utf-8 representation takes 3 bytes, but *two* column positions.
> See man wcwidth wcswidth.
>
> BTW, truncating utf-8 in the middle, as fprintf would do, can
> produce confusing mojibake.
>
> Whether it will be fixed (it is definitely annoying to implement), but
> at least current deficiencies should be documented.
>
> cat >>sqlite3.1
> .SH KNOWN BUGS
> .B .mode column
> does not handle double-width characters, patches welcomed.
[^updated]

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [Bug] Non-ASCII character is not counted in calculating column width

Reply via email to