On Mon, Feb 19, 2018 at 2:54 AM, Ralf Junker <ralfjun...@gmx.de> wrote:

> On 19.02.2018 09:50, Rowan Worth wrote:
> What is your expected answer for:
>> select length(printf ('%4s', 'です'))
> 'です' are 2 codepoints according to
>   http://www.fontspace.com/unicode/analyzer/?q=%E3%81%A7%E3%81%99
> The requested overall width is 4, so I would expect expect two added
> spaces and a total length of 4.
> Ralf
> PS: SQLite3 returns 2, which is less than the requested width.

Okay; but the functions in other databases weren't printf.  Because it is a
mimic of the C function of the same name, I would expect the count to be
(v)(s)(n)printf, sscanf unfortunatly don't know rune like Go.
Although fprintf, I might expect to understand locale and UTF8 or other
wide encodings when writing to a fopen( ..., 't' ) type file... (probably
not even then though, since I think fprintf is vsnprintf to a buffer which
is then passed to fwrite or fputs.... which then it's probably bytes.

Changing the function is bound to break things, and it wouldn't be a small
task to reimplement a C library as utf8.

the SQL functions (that are not C emulations) do work in codepoints and not
bytes (for the most part; they break unnecessarily on NUL characters, which
is non SQL compliant.... ).

Could make a function to do the same job, but correctly; but even so; you'd
have to find a utf8 printf;
not a lot of help; but maybe worth mentioning
" Just a warning, counting "characters" in Unicode data is quite a
complicated business. Besides the fact that each code point in UTF-8 is
composed of several bytes, each glyph (or "grapheme") can be composed of
several code points, and for that reason fwprintf is inadequate for
truncating Unicode data anyway -- for example you could cut off an accent
without cutting off the character it applies to. So whatever you end up
using, make sure that the meaning of the length you specify is clear to you.
 – Steve Jessop <https://stackoverflow.com/users/13005/steve-jessop> Feb 17
'12 at 9:20

I'm not finding anything; everyone recommends using different ways to do it
( use a unicode library, which doesn't have a printf) or do it in another
language - use String type or something....

> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
sqlite-users mailing list

Reply via email to