Christian Weisgerber wrote: > Ted Unangst: > > > --- ul.c 10 Oct 2015 16:15:03 -0000 1.19 > > +++ ul.c 23 Oct 2015 10:29:43 -0000 > > @@ -241,6 +241,8 @@ mfilter(FILE *f) > > obuf[col].c_mode |= BOLD|mode; > > else > > obuf[col].c_mode = mode; > > + if ((c & (0x80 | 0x40)) == 0x80 && col > 0) > > + obuf[col].c_mode = obuf[col - 1].c_mode; > > col++; > > if (col > maxcol) > > maxcol = col; > > That doesn't quite work. Check out this: > > mandoc /usr/share/man/man1/ksh.1 | sed -n 1185,1190p | ul
so that works with the diff below. i'm not sure how far down this road we need to travel, but i figure it's worth a little exploration. note that i don't think this handles the case of one character, backspace, a different character correctly, though it can asymptotically approach correct with some care. Index: ul.c =================================================================== RCS file: /cvs/src/usr.bin/ul/ul.c,v retrieving revision 1.19 diff -u -p -r1.19 ul.c --- ul.c 10 Oct 2015 16:15:03 -0000 1.19 +++ ul.c 23 Oct 2015 13:31:45 -0000 @@ -151,6 +151,12 @@ main(int argc, char *argv[]) exit(0); } +int +isu8cont(unsigned char c) +{ + return (c & (0x80 | 0x40)) == 0x80; +} + void mfilter(FILE *f) { @@ -158,8 +164,11 @@ mfilter(FILE *f) while ((c = getc(f)) != EOF && col < MAXBUF) switch(c) { case '\b': - if (col > 0) + while (col > 0) { col--; + if (!isu8cont(obuf[col].c_char)) + break; + } continue; case '\t': col = (col+8) & ~07; @@ -241,6 +250,8 @@ mfilter(FILE *f) obuf[col].c_mode |= BOLD|mode; else obuf[col].c_mode = mode; + if ((c & (0x80 | 0x40)) == 0x80 && col > 0) + obuf[col].c_mode = obuf[col - 1].c_mode; col++; if (col > maxcol) maxcol = col;