I dare to say that crunch_str calculates width right already. if i do
void hello_main(void)
{
char* str = "lll\xcc\xb4\xcc\x97\n";
char* end = str;
int width;
width = crunch_str(&end, 3, stdout, 0,0);
xprintf("\nwidth: %d\n",width);
end = str;
width = crunch_str(&end, 3, 0, 0,0);
xprintf("width: %d\n",width);
xprintf("len: %d\n",end-str);
}
I get output
jarno@Snow:~/work/src/toybox/toybox$ ./hello
lll̴̗
width: 3
width: 3
len: 7
And if i use xterm combining chars are rendered on top of last ascii
char as they are supposed to.
crunch_str loop finished only if
if (width-columns<col) break;
And since 3 - 3 < 0 is false we continue until col is something else
than 0 width....
-Jarno
On Thu, Oct 17, 2019 at 1:38 AM Rob Landley <[email protected]> wrote:
>
> n 9/19/19 2:52 PM, Jarno Mäkipää wrote:
> > Actually I think that current crunch_str prints trailing zero width
> > combining chars just fine?
> >
> > since when width==columns its still >= 0
> >
> > .................................................
> > for (end = start = *str; *end; columns += col, end += bytes) {
> > wchar_t wc;
> >
> > if ((bytes = utf8towc(&wc, end, 4))>0 && (col = wcwidth(wc))>=0) {
> > if (!escmore || wc>255 || !strchr(escmore, wc)) {
> > if (width-columns<col) break;
> > <------col is 0 when U-0x300-0x36f
> > if (out) fwrite(end, bytes, 1, out);
> >
> > continue;
> > }
> > }
> > ......................
> >
>
> The problem is when you ask it how many bytes of input will fit in a given
> number of columns (or to print up to this many columns), it doesn't give the
> trailing combining characters. It stops right after the last printing
> character
> that fits.
>
> And the callers need to be adjusted to ask "how many bytes will fit into 0
> chars" so they can add combining characters to an existing column when
> characters are coming in incrementally (from some outside source you're
> reading
> in chunks, or which is delivering individual characters like serial ports),
> and
> they've filled up the space _but_ there may be more combining characters
> coming
> in in future that attach to the existing space.
>
> > And yeah UTF-8 is good because it was originally written on napkin at
> > dinner table
> > by Ken Thompson and Rob Pike. Unicode on the other hand... not written
> > in napkin.
>
> Unicode is insane. When characters come in incrementally, you have to redraw
> the
> same glyph repeatedly because you _can't_ know when you're done until you
> overshoot.
>
> Rob
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net