Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-24 Thread Ralf Junker

On 18.02.2018 00:36, Richard Hipp wrote:


So I'm not sure whether or not this is something that ought to be "fixed".


I want to send a big Thank You! for your efforts to enhance the printf() 
string formatter:


  http://www.sqlite.org/src/info/c883c4d33f4cd722

I saw the check-in just now as I am "catching up" from a flu. Feels much 
better now :-)


Ralf
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-20 Thread John McKown
On Tue, Feb 20, 2018 at 11:44 AM, Jens Alfke  wrote:

>
>
> > On Feb 19, 2018, at 7:49 PM, petern  wrote:
> >
> > 3. Why can't SQLite have the expected common static SQL functions for
> > getting rapid development done without external tools?
>
> Because its primary use case is as an embedded library for programs, not
> as a standalone tool or server. From that perspective, it’s wasteful for
> SQLite to include functionality that can be done as well or better by the
> program that calls it.
>

​I agree. Sometimes it seems to me that people are using SQLite as if it
were a "cheap" version of "MS SQL Server"​. And then wanting it to have all
the "bells and whistles" of a full fledged, multi-user, relational SQL data
base. I can even somewhat understand that because it is just so easy to
install and use. Much easier than MySQL, MariaDB, or PostgreSQL (or any POS
on MS Windows).



>
> It’s also very easy to add custom SQL functions to SQLite, so if you have
> a need for these, you can write them yourself and either link them into
> your app, or build them as a library that the sqlite3 tool can load.
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
I have a theory that it's impossible to prove anything, but I can't prove
it.

Maranatha! <><
John McKown
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-20 Thread Jens Alfke


> On Feb 19, 2018, at 7:49 PM, petern  wrote:
> 
> 3. Why can't SQLite have the expected common static SQL functions for
> getting rapid development done without external tools?

Because its primary use case is as an embedded library for programs, not as a 
standalone tool or server. From that perspective, it’s wasteful for SQLite to 
include functionality that can be done as well or better by the program that 
calls it.

It’s also very easy to add custom SQL functions to SQLite, so if you have a 
need for these, you can write them yourself and either link them into your app, 
or build them as a library that the sqlite3 tool can load.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-20 Thread J Decker
On Mon, Feb 19, 2018 at 7:49 PM, petern  wrote:

> There are other uses for padding strings besides user reports.  Consider
> scalar representations of computations for example. Also:
>
> 1.There was no mention of user display formatting in Ralf's original
> report.  It was a bug report about missing inverse functionality for
> padding/trimming strings.
> 2.The proposed functions fully exist in the PostgreSQL archetype.  Is
> PostgreSQL wrong?
> 3. Why can't SQLite have the expected common static SQL functions for
> getting rapid development done without external tools?
> Is the goal to reduce SQL portability and increase development effort just
> to see some representative output results?
>
> I don't think anybody is trying to create production grade displays within
> SQL but being able to produce representative output and having the expected
> nucleus of built-in SQL functions (including canonical inverses) is still a
> sensible goal.
>
>
I agree completely; however, what ARE standard SQL functions?  I went
searching and found 4 different answers.  Wriiting common SQL is pretty
hard so I have always avoided any internal functions and forced the
application to do it

LTRIM( "", 4-LENGTH(whategver)) ||whatever would seem to do
the padding?

Could wish there was a public SQL standard readable... while I understand
they need to make money too; makes it hard to find reference material to
state well... according to 'The Standard' Sqlite is missing XXX.

Unfortunately earlier 3 databases with 3 different functions to do the same
thing, (SUBSTR is another?) it doesn't appear there is a desire for
concensus either


> On Mon, Feb 19, 2018 at 6:06 PM, Simon Slavin 
> wrote:
>
> > On 20 Feb 2018, at 1:38am, petern  wrote:
> >
> > > Yet even so, as Ralf pointed out, the PostgreSQL lpad() and rpad() fill
> > > with arbitrary string functionality would still be missing despite the
> > > checked in printf() being more directly equivalent to the PostgreSQL
> > > format() function.  First things first I suppose...
> > >
> > > PostgreSQL lpad() and rpad() documentation is here:
> > > https://www.postgresql.org/docs/9.5/static/functions-string.html
> >
> > The problem with string length and padding was pointed out upthread.
> > Padding strings to a length was useful in the days of fixed-width fonts.
> > We don't do that much these days.  And even if you could equip SQLite
> with
> > functions which did those things, to do it properly you'd need routines
> > which understood Unicode characters, combinations, accents and the sort
> of
> > diacritics used for Hebrew and Arabic vowels.  Without that, you fancy
> new
> > feature is just going to trigger hundreds of bug reports.
> >
> > String width functions now days take two parameters, the string (in some
> > flavour of Unicode) and a font descriptor (font, size, emphasis and other
> > options) and return the width of the string in points, taking into
> account
> > not only Unicode features but font features like kern hinting and
> > ligatures.  And you will find these features in your operating system.
> >
> > So please, folks, don't try to do this in a purposely tiny DBMS.  Do it
> > using OS calls, as the people who designed your OS intended.
> >
> > Simon.
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-20 Thread petern
There are other uses for padding strings besides user reports.  Consider
scalar representations of computations for example. Also:

1.There was no mention of user display formatting in Ralf's original
report.  It was a bug report about missing inverse functionality for
padding/trimming strings.
2.The proposed functions fully exist in the PostgreSQL archetype.  Is
PostgreSQL wrong?
3. Why can't SQLite have the expected common static SQL functions for
getting rapid development done without external tools?
Is the goal to reduce SQL portability and increase development effort just
to see some representative output results?

I don't think anybody is trying to create production grade displays within
SQL but being able to produce representative output and having the expected
nucleus of built-in SQL functions (including canonical inverses) is still a
sensible goal.

On Mon, Feb 19, 2018 at 6:06 PM, Simon Slavin  wrote:

> On 20 Feb 2018, at 1:38am, petern  wrote:
>
> > Yet even so, as Ralf pointed out, the PostgreSQL lpad() and rpad() fill
> > with arbitrary string functionality would still be missing despite the
> > checked in printf() being more directly equivalent to the PostgreSQL
> > format() function.  First things first I suppose...
> >
> > PostgreSQL lpad() and rpad() documentation is here:
> > https://www.postgresql.org/docs/9.5/static/functions-string.html
>
> The problem with string length and padding was pointed out upthread.
> Padding strings to a length was useful in the days of fixed-width fonts.
> We don't do that much these days.  And even if you could equip SQLite with
> functions which did those things, to do it properly you'd need routines
> which understood Unicode characters, combinations, accents and the sort of
> diacritics used for Hebrew and Arabic vowels.  Without that, you fancy new
> feature is just going to trigger hundreds of bug reports.
>
> String width functions now days take two parameters, the string (in some
> flavour of Unicode) and a font descriptor (font, size, emphasis and other
> options) and return the width of the string in points, taking into account
> not only Unicode features but font features like kern hinting and
> ligatures.  And you will find these features in your operating system.
>
> So please, folks, don't try to do this in a purposely tiny DBMS.  Do it
> using OS calls, as the people who designed your OS intended.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Simon Slavin
On 20 Feb 2018, at 1:38am, petern  wrote:

> Yet even so, as Ralf pointed out, the PostgreSQL lpad() and rpad() fill
> with arbitrary string functionality would still be missing despite the
> checked in printf() being more directly equivalent to the PostgreSQL
> format() function.  First things first I suppose...
> 
> PostgreSQL lpad() and rpad() documentation is here:
> https://www.postgresql.org/docs/9.5/static/functions-string.html

The problem with string length and padding was pointed out upthread.  Padding 
strings to a length was useful in the days of fixed-width fonts.  We don't do 
that much these days.  And even if you could equip SQLite with functions which 
did those things, to do it properly you'd need routines which understood 
Unicode characters, combinations, accents and the sort of diacritics used for 
Hebrew and Arabic vowels.  Without that, you fancy new feature is just going to 
trigger hundreds of bug reports.

String width functions now days take two parameters, the string (in some 
flavour of Unicode) and a font descriptor (font, size, emphasis and other 
options) and return the width of the string in points, taking into account not 
only Unicode features but font features like kern hinting and ligatures.  And 
you will find these features in your operating system.

So please, folks, don't try to do this in a purposely tiny DBMS.  Do it using 
OS calls, as the people who designed your OS intended.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread J Decker
On Mon, Feb 19, 2018 at 5:38 PM, petern  wrote:

> FYI.  See http://www.sqlite.org/src/timeline for the equivalent DRH
> checkins:  http://www.sqlite.org/src/info/c883c4d33f4cd722
> Hopefully that branch will make a forthcoming trunk merge.   [Printing
> explicit nul terminator by formatting an interesting twist.]
>
> @DRH
printf( "whatever%ctest", 0 ); should result with that character in the
string
int length = snprintf( buf, 256, "whatever%ctest", 0 );

length == 13 while yes, applying strlen to the same buffer will result in
only 8 as the length.


> Yet even so, as Ralf pointed out, the PostgreSQL lpad() and rpad() fill
> with arbitrary string functionality would still be missing despite the
> checked in printf() being more directly equivalent to the PostgreSQL
> format() function.  First things first I suppose...
>
> PostgreSQL lpad() and rpad() documentation is here:
> https://www.postgresql.org/docs/9.5/static/functions-string.html
>
> Peter
>
> On Mon, Feb 19, 2018 at 4:38 PM, Cezary H. Noweta 
> wrote:
>
> > Hello,
> >
> > On 2018-02-17 18:39, Ralf Junker wrote:
> >
> >> Example SQL:
> >>
> >> select
> >>length(printf ('%4s', 'abc')),
> >>length(printf ('%4s', 'äöü')),
> >>length(printf ('%-4s', 'abc')),
> >>length(printf ('%-4s', 'äöü'))
> >>
> >> Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes
> >> instead of UTF-8 code points.
> >>
> >> Should padding not work on code points and output 4 in all cases as
> >> requested?
> >>
> >
> > If you are interested in a patch extending a functionality of
> ``printf()''
> > then http://sqlite.chncc.eu/utf8printf/. Adding ``l'' length modifier
> > makes width/precision specifications being treated as numbers of UTF-8
> > chars -- not bytes. ``SELECT length(printf ('%4ls', 'äöü'));'' will give
> 4.
> >
> > -- best regards
> >
> > Cezary H. Noweta
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread petern
FYI.  See http://www.sqlite.org/src/timeline for the equivalent DRH
checkins:  http://www.sqlite.org/src/info/c883c4d33f4cd722
Hopefully that branch will make a forthcoming trunk merge.   [Printing
explicit nul terminator by formatting an interesting twist.]

Yet even so, as Ralf pointed out, the PostgreSQL lpad() and rpad() fill
with arbitrary string functionality would still be missing despite the
checked in printf() being more directly equivalent to the PostgreSQL
format() function.  First things first I suppose...

PostgreSQL lpad() and rpad() documentation is here:
https://www.postgresql.org/docs/9.5/static/functions-string.html

Peter

On Mon, Feb 19, 2018 at 4:38 PM, Cezary H. Noweta 
wrote:

> Hello,
>
> On 2018-02-17 18:39, Ralf Junker wrote:
>
>> Example SQL:
>>
>> select
>>length(printf ('%4s', 'abc')),
>>length(printf ('%4s', 'äöü')),
>>length(printf ('%-4s', 'abc')),
>>length(printf ('%-4s', 'äöü'))
>>
>> Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes
>> instead of UTF-8 code points.
>>
>> Should padding not work on code points and output 4 in all cases as
>> requested?
>>
>
> If you are interested in a patch extending a functionality of ``printf()''
> then http://sqlite.chncc.eu/utf8printf/. Adding ``l'' length modifier
> makes width/precision specifications being treated as numbers of UTF-8
> chars -- not bytes. ``SELECT length(printf ('%4ls', 'äöü'));'' will give 4.
>
> -- best regards
>
> Cezary H. Noweta
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Cezary H. Noweta

Hello,

On 2018-02-17 18:39, Ralf Junker wrote:

Example SQL:

select
   length(printf ('%4s', 'abc')),
   length(printf ('%4s', 'äöü')),
   length(printf ('%-4s', 'abc')),
   length(printf ('%-4s', 'äöü'))

Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes 
instead of UTF-8 code points.


Should padding not work on code points and output 4 in all cases as 
requested?


If you are interested in a patch extending a functionality of 
``printf()'' then http://sqlite.chncc.eu/utf8printf/. Adding ``l'' 
length modifier makes width/precision specifications being treated as 
numbers of UTF-8 chars -- not bytes. ``SELECT length(printf ('%4ls', 
'äöü'));'' will give 4.


-- best regards

Cezary H. Noweta
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Keith Medcalf

Should not your application just retrieve the UTF-8 text and format it for 
display to the user?  User <-> Software formatting (and input/output diddling 
of any type) should only be done ONCE (on INPUT from the user or on OUTPUT to 
the user) as close to the User as possible and should *NEVER EVER* be done as 
an intermediate step that is used for any other purpose (that originating from 
or terminating with a user).

It is neither possible nor expected for a "Data Storage System" to know about 
the local foibles of the user -- that is an application programming (UI) issue. 
 "Data Storage Systems" store data in a user-foible-free state.  Date/Time are 
UTC ISO8601, text (encoded) is just a bunch of character units stored 
side-by-each, blobs are just a sequence of bytes stored side-by-each, integers 
are, well, integers in binary format; and, floating point is stored in floating 
point format (as an approximation to a value).

Placement of comma's/decimal points, display precision of floating point 
numbers, formatting of dates and "bag-o-bytes" (encoded text or blobs) are UI 
issues and not properly a part of the "Data Storage System".

That SQLite contains a "printf" function is quaint, but it is merely quaint and 
should not be expected to provide the same capabilities as a "proper" UI.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Ralf Junker
>Sent: Saturday, 17 February, 2018 10:40
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] printf() problem padding multi-byte UTF-8 code
>points
>
>Example SQL:
>
>select
>   length(printf ('%4s', 'abc')),
>   length(printf ('%4s', 'äöü')),
>   length(printf ('%-4s', 'abc')),
>   length(printf ('%-4s', 'äöü'))
>
>Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes
>instead of UTF-8 code points.
>
>Should padding not work on code points and output 4 in all cases as
>requested?
>
>Ralf
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Jens Alfke


> On Feb 19, 2018, at 2:54 AM, Ralf Junker  wrote:
> 
> 'です' are 2 codepoints according to
> 
>  http://www.fontspace.com/unicode/analyzer/?q=%E3%81%A7%E3%81%99 
> 
> 
> The requested overall width is 4, so I would expect expect two added spaces 
> and a total length of 4.

If this is being done for the purpose of visual alignment in a monospaced font, 
it's not going to work. Both of those Kanji(?) characters are displayed as 
double-width (in macOS's Terminal at least), so their visual width is 4 spaces, 
meaning there should be zero spaces of padding.

You really _cannot_ equate Unicode code-points with visual width of displayed 
text, even in a monospaced layout. Not only do terminals render some characters 
as double-width, but there are all kinds of other exceptions like zero-width 
joiners, diacritical marks, ligatures, and joined forms. As a very common 
example of the latter, many emojis — e.g. all the faces with multiple skin 
tones — are actually composed of multiple (up to five or six) Unicode 
code-points.

TL;DR: If you use character (code-point) counts to visually lay out text, 
you're likely to get bad results with anything other than plain ASCII, so it's 
only marginally better than just counting bytes.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread petern
As d3ck0r suggested.  adding a byte_length() function would enable padding
of spaces [but not general padding with arbitrary characters as lpad() and
rpad() afford].

WITH points(p) AS (VALUES ('abc'), ('äöü'), ('です'))
,format(f) AS (VALUES ('%*s'), ('%-*s'))
,pad AS (SELECT p, f, printf(f,byte_length(p)+(4-length(p)),p)pad FROM
points CROSS JOIN format)
SELECT p,f,pad,length(pad)len FROM pad;

'p','f','pad','len'
'abc','%*s',' abc',4
'abc','%-*s','abc ',4
'äöü','%*s',' äöü',4
'äöü','%-*s','äöü ',4
'です','%*s','  です',4
'です','%-*s','です  ',4

A new byte_length() function is a great idea but for getting action on
publishing it and the requisite help page entry.
I recently asked to add 1 protection source line in the eval() function
against segmentation fault but got neither action nor reply.
Experience suggests you will have to add the 3 source lines to your local
copy of SQLite if you must to pad strings containing high code points:

static void byte_length(sqlite3_context *context, int argc, sqlite3_value
**argv) {
  sqlite3_result_int(context, sqlite3_value_bytes(argv[0]));
}

Peter





On Mon, Feb 19, 2018 at 12:43 AM, Ralf Junker  wrote:

> On 18.02.2018 00:36, Richard Hipp wrote:
>
> The current behavior of the printf() function in SQLite, goofy though
>> it may be, exactly mirrors the behavior of the printf() C function in
>> the standard library in this regard.
>>
>
> SQLite3 is not C. SQLite3 text storage is always Unicode. Thus SQL text
> processing functions should work on Unicode. The current implementation
> of the SQLite3 SQL printf() can not reliably be used for string padding.
> And there is no simple alternative, AFAICS.
>
> PostgreSQL returns 4 in all cases:
>
> select
>length(format ('%4s', 'abc')),
>length(format ('%4s', 'äöü')),
>length(format ('%-4s', 'abc')),
>length(format ('%-4s', 'äöü'))
>
> MySQL has lpad() and rpad() to achieve the same and also returns 4 in
> all cases:
>
> select
>length(lpad ('abc', 4, ' ')),
>length(lpad ('äöü', 4, ' ')),
>length(rpad ('abc', 4, ' ')),
>length(rpad ('äöü', 4, ' '))
>
> I strongly believe that SQLite3 should follow suit.
>
> Ralf
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread J Decker
On Mon, Feb 19, 2018 at 2:54 AM, Ralf Junker  wrote:

> On 19.02.2018 09:50, Rowan Worth wrote:
>
> What is your expected answer for:
>>
>> select length(printf ('%4s', 'です'))
>>
>
> 'です' are 2 codepoints according to
>
>   http://www.fontspace.com/unicode/analyzer/?q=%E3%81%A7%E3%81%99
>
> The requested overall width is 4, so I would expect expect two added
> spaces and a total length of 4.
>
> Ralf
>
> PS: SQLite3 returns 2, which is less than the requested width.


Okay; but the functions in other databases weren't printf.  Because it is a
mimic of the C function of the same name, I would expect the count to be
bytes...
(v)(s)(n)printf, sscanf unfortunatly don't know rune like Go.
Although fprintf, I might expect to understand locale and UTF8 or other
wide encodings when writing to a fopen( ..., 't' ) type file... (probably
not even then though, since I think fprintf is vsnprintf to a buffer which
is then passed to fwrite or fputs which then it's probably bytes.

Changing the function is bound to break things, and it wouldn't be a small
task to reimplement a C library as utf8.

the SQL functions (that are not C emulations) do work in codepoints and not
bytes (for the most part; they break unnecessarily on NUL characters, which
is non SQL compliant ).

Could make a function to do the same job, but correctly; but even so; you'd
have to find a utf8 printf;
https://stackoverflow.com/questions/9325487/looking-for-utf8-aware-formatting-functions-like-printf-etc
not a lot of help; but maybe worth mentioning
" Just a warning, counting "characters" in Unicode data is quite a
complicated business. Besides the fact that each code point in UTF-8 is
composed of several bytes, each glyph (or "grapheme") can be composed of
several code points, and for that reason fwprintf is inadequate for
truncating Unicode data anyway -- for example you could cut off an accent
without cutting off the character it applies to. So whatever you end up
using, make sure that the meaning of the length you specify is clear to you.
 – Steve Jessop  Feb 17
'12 at 9:20

 "
"
"
"

I'm not finding anything; everyone recommends using different ways to do it
( use a unicode library, which doesn't have a printf) or do it in another
language - use String type or something









>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread J Decker
On Mon, Feb 19, 2018 at 3:21 AM, Cezary H. Noweta 
wrote:

> Hello,
>
> On 2018-02-18 00:36, Richard Hipp wrote:
>
>> The current behavior of the printf() function in SQLite, goofy though
>> it may be, exactly mirrors the behavior of the printf() C function in
>> the standard library in this regard.
>>
>
> So I'm not sure whether or not this is something that ought to be "fixed".
>>
>
> For the sake of sanity, such exception would be considered. I.e.
> ``length'' specification could mean number of ``multibyte characters'' --
> not ``characters''. A C programmer has a chance to put all his buffer,
> especially that there are no special provisions on multibyte characters in
> the buffer (i.e. it must not begin nor end with an initial shift state):
> for ( i = 0; len > i; i += 5 ) printf("%-5.5s", [i]); -- a bit non-sense
> but illustrates the problem.
>
> On the other hand, SQLite's SQL has no access to memory buffers. In such
> case, the C standard handles the situation (look at the end of ``s''
> conversion specifier together with ``l'' flag): ``In no case is a partial
> multibyte character written.''.
>
> Is there somebody who things about a byte content of buffers, when he is
> writing a software at a SQL level?

everyone dealing with padding/precision using printf() ?


>
>
> -- best regards
>
> Cezary H. Noweta
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Cezary H. Noweta

Hello,

On 2018-02-18 00:36, Richard Hipp wrote:

The current behavior of the printf() function in SQLite, goofy though
it may be, exactly mirrors the behavior of the printf() C function in
the standard library in this regard.



So I'm not sure whether or not this is something that ought to be "fixed".


For the sake of sanity, such exception would be considered. I.e. 
``length'' specification could mean number of ``multibyte characters'' 
-- not ``characters''. A C programmer has a chance to put all his 
buffer, especially that there are no special provisions on multibyte 
characters in the buffer (i.e. it must not begin nor end with an initial 
shift state): for ( i = 0; len > i; i += 5 ) printf("%-5.5s", [i]); -- 
a bit non-sense but illustrates the problem.


On the other hand, SQLite's SQL has no access to memory buffers. In such 
case, the C standard handles the situation (look at the end of ``s'' 
conversion specifier together with ``l'' flag): ``In no case is a 
partial multibyte character written.''.


Is there somebody who things about a byte content of buffers, when he is 
writing a software at a SQL level?


-- best regards

Cezary H. Noweta
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Ralf Junker

On 19.02.2018 09:50, Rowan Worth wrote:


What is your expected answer for:

select length(printf ('%4s', 'です'))


'です' are 2 codepoints according to

  http://www.fontspace.com/unicode/analyzer/?q=%E3%81%A7%E3%81%99

The requested overall width is 4, so I would expect expect two added 
spaces and a total length of 4.


Ralf

PS: SQLite3 returns 2, which is less than the requested width.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Rowan Worth
What is your expected answer for:

select length(printf ('%4s', 'です'))

-Rowan

On 18 February 2018 at 01:39, Ralf Junker  wrote:

> Example SQL:
>
> select
>   length(printf ('%4s', 'abc')),
>   length(printf ('%4s', 'äöü')),
>   length(printf ('%-4s', 'abc')),
>   length(printf ('%-4s', 'äöü'))
>
> Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes
> instead of UTF-8 code points.
>
> Should padding not work on code points and output 4 in all cases as
> requested?
>
> Ralf
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-19 Thread Ralf Junker

On 18.02.2018 00:36, Richard Hipp wrote:


The current behavior of the printf() function in SQLite, goofy though
it may be, exactly mirrors the behavior of the printf() C function in
the standard library in this regard.


SQLite3 is not C. SQLite3 text storage is always Unicode. Thus SQL text
processing functions should work on Unicode. The current implementation
of the SQLite3 SQL printf() can not reliably be used for string padding.
And there is no simple alternative, AFAICS.

PostgreSQL returns 4 in all cases:

select
   length(format ('%4s', 'abc')),
   length(format ('%4s', 'äöü')),
   length(format ('%-4s', 'abc')),
   length(format ('%-4s', 'äöü'))

MySQL has lpad() and rpad() to achieve the same and also returns 4 in
all cases:

select
   length(lpad ('abc', 4, ' ')),
   length(lpad ('äöü', 4, ' ')),
   length(rpad ('abc', 4, ' ')),
   length(rpad ('äöü', 4, ' '))

I strongly believe that SQLite3 should follow suit.

Ralf
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-17 Thread Dominique Pellé
Richard Hipp  wrote:

> On 2/17/18, Ralf Junker  wrote:
>> Example SQL:
>>
>> select
>>length(printf ('%4s', 'abc')),
>>length(printf ('%4s', 'äöü')),
>>length(printf ('%-4s', 'abc')),
>>length(printf ('%-4s', 'äöü'))
>>
>> Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes
>> instead of UTF-8 code points.
>>
>> Should padding not work on code points and output 4 in all cases as
>> requested?
>
> The current behavior of the printf() function in SQLite, goofy though
> it may be, exactly mirrors the behavior of the printf() C function in
> the standard library in this regard.
>
> So I'm not sure whether or not this is something that ought to be "fixed".


For what it's worth, this is what bash does, which looks
consistent with SQLite:

$ printf '[%4s]\n' 'abc'
[ abc]
$ printf '[%4s]\n' 'äöü'
[äöü]
$ printf '[%-4s]\n' 'abc'
[abc ]
$ printf '[%-4s]\n' 'äöü'
[äöü]

Perl does the same:

$ perl -e 'printf("[%4s]\n", "äöü")'
[äöü]

Vim printf() function does the same, but vim also
has a more convenient %S not present in the C printf(),
see :help printf()

  %sstring
  %6Sstring right-aligned in 6 display cells
  %6sstring right-aligned in 6 bytes
  %.9sstring truncated to 9 bytes

:echo printf('[%4s]', 'äöü')
[äöü]
:echo printf('[%4S]', 'äöü')
:[ äöü]

Perhaps SQLite could add %S along those lines.
After all, SQLite already added "%q", "%Q", "%w"
and "%z" which are not present in the C printf().

Vim uses the number of display cells (not number of
code points). East Asian characters generally take
twice the size of Latin characters on screen, and
such characters take 2 cells on screen. Vim also
provides functions to find string length in bytes strlen(),
in display cells strwidth() and number of characters
strchars():

:echo strlen('äöü')
6
:echo strwidth('äöü')
3
:echo strchars('äöü')
3

With a more interesting string containing
East Asian characters:

:echo strlen('äöü中文')
12
:echo strwidth('äöü中文')
7
:echo strchars('äöü中文')
5

Regards
Dominique
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-17 Thread Cezary H. Noweta

Hello,

On 2018-02-18 01:46, Peter Da Silva wrote:

Printf's handling of unicode is inconsistent in other ways, too. I suspect that 
there's still undefined behavior floating around in there too. Even wprintf 
isn't entirely unsurprising:


You have supplied examples which are exchanged with each other and are 
confirming ``unsuprisingness'':



LANG=en_US.UTF-8


Ok - so your native environment locale is ``UTF-8''.


% cat localized.c


Why that program is named ``localized'' if...


[...]
int main() {
wprintf (L"'%4ls'\n", L"äöü");


... you are using "C" locale for LC_CTYPE? Behavior entirely 
unsurprising: there is no conversion from L"äöü" using "C" LC_CTYPE.



[...]
% cat delocalized.c


Why that program is named ``delocalized'' if...


[...]
setlocale(LC_ALL, "");


... you are using native environment locale (``UTF-8'') for LC_CTYPE? 
Behavior entirely unsurprising: there is conversion from L"äöü" using 
"UTF-8" LC_CTYPE.


-- best regards

Cezary H. Noweta
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-17 Thread Peter Da Silva
On 2018-02-17, at 17:36, Richard Hipp  wrote:
> The current behavior of the printf() function in SQLite, goofy though
> it may be, exactly mirrors the behavior of the printf() C function in
> the standard library in this regard.
> 
> So I'm not sure whether or not this is something that ought to be "fixed".

Printf's handling of unicode is inconsistent in other ways, too. I suspect that 
there's still undefined behavior floating around in there too. Even wprintf 
isn't entirely unsurprising:

% env
...
LANG=en_US.UTF-8
...
% cat localized.c
#include 
#include 

int main() {
wprintf (L"'%4ls'\n", L"äöü");
}
% cc localized.c
% ./a.out
' ???'
% cat delocalized.c
#include 
#include 
#include 

int main() {
setlocale(LC_ALL, "");
wprintf (L"'%4ls'\n", L"äöü");
}
% cc delocalized.c
% ./a.out
' äöü'
% uname -a
Darwin Stonehenge.local 16.7.0 Darwin Kernel Version 16.7.0: Thu Jan 11 
22:59:40 PST 2018; root:xnu-3789.73.8~1/RELEASE_X86_64 x86_64

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-17 Thread J Decker
On Sat, Feb 17, 2018 at 3:36 PM, Richard Hipp  wrote:

> On 2/17/18, Ralf Junker  wrote:
> > Example SQL:
> >
> > select
> >length(printf ('%4s', 'abc')),
> >length(printf ('%4s', 'äöü')),
> >length(printf ('%-4s', 'abc')),
> >length(printf ('%-4s', 'äöü'))
> >
> > Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes
> > instead of UTF-8 code points.
> >
> > Should padding not work on code points and output 4 in all cases as
> > requested?
>
> The current behavior of the printf() function in SQLite, goofy though
> it may be, exactly mirrors the behavior of the printf() C function in
> the standard library in this regard.
>
> So I'm not sure whether or not this is something that ought to be "fixed".
>
the length() SQL function and other character functions (rtrim/ltrim)
attempt to deal with codepoints not bytes...

Maybe an added function something like  `u8length( string, count )`  which
returns bytes for count characters in a string that could be passed to
printf( "%-*s",  u8length( 'äöü' , 4 ),  'äöü' )



> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-17 Thread Richard Hipp
On 2/17/18, Ralf Junker  wrote:
> Example SQL:
>
> select
>length(printf ('%4s', 'abc')),
>length(printf ('%4s', 'äöü')),
>length(printf ('%-4s', 'abc')),
>length(printf ('%-4s', 'äöü'))
>
> Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes
> instead of UTF-8 code points.
>
> Should padding not work on code points and output 4 in all cases as
> requested?

The current behavior of the printf() function in SQLite, goofy though
it may be, exactly mirrors the behavior of the printf() C function in
the standard library in this regard.

So I'm not sure whether or not this is something that ought to be "fixed".
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] printf() problem padding multi-byte UTF-8 code points

2018-02-17 Thread Ralf Junker

Example SQL:

select
  length(printf ('%4s', 'abc')),
  length(printf ('%4s', 'äöü')),
  length(printf ('%-4s', 'abc')),
  length(printf ('%-4s', 'äöü'))

Output is 4, 3, 4, 3. Padding seems to take into account UTF-8 bytes 
instead of UTF-8 code points.


Should padding not work on code points and output 4 in all cases as 
requested?


Ralf
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users