Re: does anoybody use ul?

2015-10-23 Thread Nicholas Marriott
Well, it does work:

printf 'A\bA_\bB'|ul

I still think it is not useful, I say kill it.



On Fri, Oct 23, 2015 at 03:47:56AM -0400, Ted Unangst wrote:
> ul appears somewhat useless for its intended purpose.
> 
> echo _xxx_ | ul does not result in underlined text in an xterm, so I doubt
> many people are using this.
> 
> Unlike, say, mandoc, it can't output Greek letters. I also imagine most people
> have moved on to some form of markdown for their other text markup needs.
> 
> Will anyone miss it?
> 
> 
> Index: Makefile
> ===
> RCS file: /cvs/src/usr.bin/Makefile,v
> retrieving revision 1.153
> diff -u -p -r1.153 Makefile
> --- Makefile  16 Jul 2015 20:50:40 -  1.153
> +++ Makefile  23 Oct 2015 07:43:25 -
> @@ -25,7 +25,7 @@ SUBDIR= apply arch at aucat audioctl awk
>   sort spell split sqlite3 ssh stat su systat \
>   tail talk tcpbench tee telnet tftp tic time \
>   tmux top touch tput tr true tset tsort tty usbhidaction usbhidctl \
> - ul uname unexpand unifdef uniq units \
> + uname unexpand unifdef uniq units \
>   unvis users uudecode uuencode vacation vi vis vmstat w wall wc \
>   what which who whois write x99token xargs xinstall \
>   yacc yes
> 



Re: does anoybody use ul?

2015-10-23 Thread Ted Unangst
Nicholas Marriott wrote:
> Well, it does work:
> 
> printf 'A\bA_\bB'|ul
> 
> I still think it is not useful, I say kill it.

Oh! Is that how you use it? The man page doesn't explain, apparently expecting
that everybody just knows there's only one true way to mark up text.



Re: does anoybody use ul?

2015-10-23 Thread Alessandro DE LAURENZIS
Hi Ted,

On Fri 23/10/2015 03:47, Ted Unangst wrote:
> ul appears somewhat useless for its intended purpose.
> 
> echo _xxx_ | ul does not result in underlined text in an xterm, so I doubt
> many people are using this.
[...]

I don't use it anymore, but some time ago I needed to quickly highlight
some text in a couple of scripts and found out (see [0]) that ul(1)
requires a rather obscure sequence of backspaces and underscores to
work properly; just try:

echo $'x\b_x\b_x\b_' | ul

which correctly underlines the "xxx" string in xterm.

I'm not against the removal, this is only to say that the command
actually works.

Cheers

[0]: 
http://unix.stackexchange.com/questions/3044/how-to-use-the-ul-command-line-utility

-- 
Alessandro DE LAURENZIS
[mailto:just22@gmail.com]
LinkedIn: http://it.linkedin.com/in/delaurenzis



Re: does anoybody use ul?

2015-10-23 Thread Stuart Henderson
On 2015/10/23 11:52, Ingo Schwarze wrote:
> I didn't use it so far, but now that you made me look at it, i'm
> likely to start using it almost daily.  I often felt unhappy that
> my gmdiff tool (for comparing groff and mandoc output) does not
> show bold and underline fonts and i always had to pipe the gmdiff
> output to less, even if it was short.  Calling ul(1) at the end of
> the gmdiff script elegantly fixes that problem.

This made me try ul(1) with gmdiff, which led me to trying it with an
incomplete filename (I used tab-completion and didn't notice that there
were multiple files). Found this:

$ mandoc nonexistent
mandoc: nonexistent: ERROR: No such file or directory
Segmentation fault 



Re: does anoybody use ul?

2015-10-23 Thread Ted Unangst
Stefan Sperling wrote:
> On Fri, Oct 23, 2015 at 05:50:53AM -0400, Ted Unangst wrote:
> > well, it doesn't work with utf-8 because it tries to underline only half the
> > character. i'm aiming for the "quick fix"...
> 
> Why not at least try a kind of better fix to see how it would work?

Is this "better"? It copies the state of the previous letter to the next if
it's a utf-8 continuation byte.

FreeBSD has full blown multibyte support as well, but it requires switching
everything to wchar_t. Which way are we going here?


Index: ul.c
===
RCS file: /cvs/src/usr.bin/ul/ul.c,v
retrieving revision 1.19
diff -u -p -r1.19 ul.c
--- ul.c10 Oct 2015 16:15:03 -  1.19
+++ ul.c23 Oct 2015 10:29:43 -
@@ -241,6 +241,8 @@ mfilter(FILE *f)
obuf[col].c_mode |= BOLD|mode;
else
obuf[col].c_mode = mode;
+   if ((c & (0x80 | 0x40)) == 0x80 && col > 0)
+   obuf[col].c_mode = obuf[col - 1].c_mode;
col++;
if (col > maxcol)
maxcol = col;



Re: does anoybody use ul?

2015-10-23 Thread Christian Weisgerber
Ted Unangst:

> > mandoc /usr/share/man/man1/ksh.1 | sed -n 1185,1190p | ul
> 
> so that works with the diff below. i'm not sure how far down this road we need
> to travel, but i figure it's worth a little exploration.

That works so far.
Next problem: tabs.

(I think it's reasonable to ignore combinding characters in ul...)

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: does anoybody use ul?

2015-10-23 Thread Ted Unangst
Christian Weisgerber wrote:
> On 2015-10-23, "Ted Unangst"  wrote:
> 
> > ul appears somewhat useless for its intended purpose.
> 
> mandoc /usr/share/man/man1/ls.1 | ul
> 
> Works fine.  Of course that functionality has been incorporated
> into more/less decades ago.
> 
> > Will anyone miss it?
> 
> Probably not, but what's the benefit of deleting it?

well, it doesn't work with utf-8 because it tries to underline only half the
character. i'm aiming for the "quick fix"...



Re: does anoybody use ul?

2015-10-23 Thread Ingo Schwarze
Hi Ted,

Ted Unangst wrote on Fri, Oct 23, 2015 at 03:47:56AM -0400:

> ul appears somewhat useless for its intended purpose.
> 
> echo _xxx_ | ul does not result in underlined text in an xterm,
> so I doubt many people are using this.
> 
> Unlike, say, mandoc, it can't output Greek letters.
> I also imagine most people have moved on to some form
> of markdown for their other text markup needs.

Your sentence sounds a bit like "mandoc can do the same, just better".
That statement would be totally misleading.

Experimenting a bit with ul(1), i just learnt something new:
Actually, mandoc(1) *output* is exactly what ul(1) expects as *input*.

Compare:

  $ echo '.Fl o Ar arg' | mandoc -mdoc

On the terminal, you see no underlining and no bold face.

  $ echo '.Fl o Ar arg' | mandoc -mdoc | ul

Now, the "-o" is bold and the "arg" is underlined.

  $ echo '.Fl o Ar arg' | mandoc -mdoc | less
  $ echo '.Fl o Ar arg' | mandoc -mdoc -l

Again, both of these do show bold and underlined text on the terminal
because less(1) includes functionality similar to ul(1).

> Will anyone miss it?

I didn't use it so far, but now that you made me look at it, i'm
likely to start using it almost daily.  I often felt unhappy that
my gmdiff tool (for comparing groff and mandoc output) does not
show bold and underline fonts and i always had to pipe the gmdiff
output to less, even if it was short.  Calling ul(1) at the end of
the gmdiff script elegantly fixes that problem.

Unless people strongly insist on killing the ul(1) utility, i'm
planning to fix the ul(1) manual page.  I does indead lack essential
information, i RTFS to understand the purpose of the utility.

Thanks for the pointer,
  Ingo



Re: does anoybody use ul?

2015-10-23 Thread Christian Weisgerber
Ted Unangst:

> --- ul.c  10 Oct 2015 16:15:03 -  1.19
> +++ ul.c  23 Oct 2015 10:29:43 -
> @@ -241,6 +241,8 @@ mfilter(FILE *f)
>   obuf[col].c_mode |= BOLD|mode;
>   else
>   obuf[col].c_mode = mode;
> + if ((c & (0x80 | 0x40)) == 0x80 && col > 0)
> + obuf[col].c_mode = obuf[col - 1].c_mode;
>   col++;
>   if (col > maxcol)
>   maxcol = col;

That doesn't quite work.  Check out this:

mandoc /usr/share/man/man1/ksh.1 | sed -n 1185,1190p | ul

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: does anoybody use ul?

2015-10-23 Thread Ted Unangst
Christian Weisgerber wrote:
> Ted Unangst:
> 
> > --- ul.c10 Oct 2015 16:15:03 -  1.19
> > +++ ul.c23 Oct 2015 10:29:43 -
> > @@ -241,6 +241,8 @@ mfilter(FILE *f)
> > obuf[col].c_mode |= BOLD|mode;
> > else
> > obuf[col].c_mode = mode;
> > +   if ((c & (0x80 | 0x40)) == 0x80 && col > 0)
> > +   obuf[col].c_mode = obuf[col - 1].c_mode;
> > col++;
> > if (col > maxcol)
> > maxcol = col;
> 
> That doesn't quite work.  Check out this:
> 
> mandoc /usr/share/man/man1/ksh.1 | sed -n 1185,1190p | ul

so that works with the diff below. i'm not sure how far down this road we need
to travel, but i figure it's worth a little exploration.

note that i don't think this handles the case of one character, backspace, a
different character correctly, though it can asymptotically approach
correct with some care.

Index: ul.c
===
RCS file: /cvs/src/usr.bin/ul/ul.c,v
retrieving revision 1.19
diff -u -p -r1.19 ul.c
--- ul.c10 Oct 2015 16:15:03 -  1.19
+++ ul.c23 Oct 2015 13:31:45 -
@@ -151,6 +151,12 @@ main(int argc, char *argv[])
exit(0);
 }
 
+int
+isu8cont(unsigned char c)
+{
+   return (c & (0x80 | 0x40)) == 0x80;
+}
+
 void
 mfilter(FILE *f)
 {
@@ -158,8 +164,11 @@ mfilter(FILE *f)
 
while ((c = getc(f)) != EOF && col < MAXBUF) switch(c) {
case '\b':
-   if (col > 0)
+   while (col > 0) {
col--;
+   if (!isu8cont(obuf[col].c_char))
+   break;
+   }
continue;
case '\t':
col = (col+8) & ~07;
@@ -241,6 +250,8 @@ mfilter(FILE *f)
obuf[col].c_mode |= BOLD|mode;
else
obuf[col].c_mode = mode;
+   if ((c & (0x80 | 0x40)) == 0x80 && col > 0)
+   obuf[col].c_mode = obuf[col - 1].c_mode;
col++;
if (col > maxcol)
maxcol = col;



Re: does anoybody use ul?

2015-10-23 Thread Theo de Raadt
> > From: "Ted Unangst" 
> > Date: Fri, 23 Oct 2015 03:47:56 -0400
> > 
> > ul appears somewhat useless for its intended purpose.
> > 
> > echo _xxx_ | ul does not result in underlined text in an xterm, so I doubt
> > many people are using this.
> > 
> > Unlike, say, mandoc, it can't output Greek letters. I also imagine most 
> > people
> > have moved on to some form of markdown for their other text markup needs.
> > 
> > Will anyone miss it?
> 
> I doubt anybody ever uses it directly, but I've seen it used in
> scripts.  And even though POSIX doesn't mention it, it is present on
> pretty much any UNIX-like OS.  I think removing it would be wrong.

printf '_\bc_\bh_\ba_\bn_\bg_\be_\b _\bi_\bs_\b _\bb_\ba_\bd_\b!\b' | ul

:-)



Re: does anoybody use ul?

2015-10-23 Thread Stefan Sperling
On Fri, Oct 23, 2015 at 06:32:32AM -0400, Ted Unangst wrote:
> Stefan Sperling wrote:
> > On Fri, Oct 23, 2015 at 05:50:53AM -0400, Ted Unangst wrote:
> > > well, it doesn't work with utf-8 because it tries to underline only half 
> > > the
> > > character. i'm aiming for the "quick fix"...
> > 
> > Why not at least try a kind of better fix to see how it would work?
> 
> Is this "better"? It copies the state of the previous letter to the next if
> it's a utf-8 continuation byte.

Nice trick.
Could the application define a macro with a meaningful name for this?

> FreeBSD has full blown multibyte support as well, but it requires switching
> everything to wchar_t. Which way are we going here?

I think we're trying to avoid wchar_t if possible.
 
> Index: ul.c
> ===
> RCS file: /cvs/src/usr.bin/ul/ul.c,v
> retrieving revision 1.19
> diff -u -p -r1.19 ul.c
> --- ul.c  10 Oct 2015 16:15:03 -  1.19
> +++ ul.c  23 Oct 2015 10:29:43 -
> @@ -241,6 +241,8 @@ mfilter(FILE *f)
>   obuf[col].c_mode |= BOLD|mode;
>   else
>   obuf[col].c_mode = mode;
> + if ((c & (0x80 | 0x40)) == 0x80 && col > 0)
> + obuf[col].c_mode = obuf[col - 1].c_mode;
>   col++;
>   if (col > maxcol)
>   maxcol = col;



Re: does anoybody use ul?

2015-10-23 Thread Mark Kettenis
> From: "Ted Unangst" 
> Date: Fri, 23 Oct 2015 03:47:56 -0400
> 
> ul appears somewhat useless for its intended purpose.
> 
> echo _xxx_ | ul does not result in underlined text in an xterm, so I doubt
> many people are using this.
> 
> Unlike, say, mandoc, it can't output Greek letters. I also imagine most people
> have moved on to some form of markdown for their other text markup needs.
> 
> Will anyone miss it?

I doubt anybody ever uses it directly, but I've seen it used in
scripts.  And even though POSIX doesn't mention it, it is present on
pretty much any UNIX-like OS.  I think removing it would be wrong.

> Index: Makefile
> ===
> RCS file: /cvs/src/usr.bin/Makefile,v
> retrieving revision 1.153
> diff -u -p -r1.153 Makefile
> --- Makefile  16 Jul 2015 20:50:40 -  1.153
> +++ Makefile  23 Oct 2015 07:43:25 -
> @@ -25,7 +25,7 @@ SUBDIR= apply arch at aucat audioctl awk
>   sort spell split sqlite3 ssh stat su systat \
>   tail talk tcpbench tee telnet tftp tic time \
>   tmux top touch tput tr true tset tsort tty usbhidaction usbhidctl \
> - ul uname unexpand unifdef uniq units \
> + uname unexpand unifdef uniq units \
>   unvis users uudecode uuencode vacation vi vis vmstat w wall wc \
>   what which who whois write x99token xargs xinstall \
>   yacc yes
> 
> 



Re: does anoybody use ul?

2015-10-23 Thread Christian Weisgerber
Ted Unangst:

> well, it doesn't work with utf-8 because it tries to underline only half the
> character. i'm aiming for the "quick fix"...

FreeBSD has imported a fix for this, r132858.

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: does anoybody use ul?

2015-10-23 Thread Stefan Sperling
On Fri, Oct 23, 2015 at 05:50:53AM -0400, Ted Unangst wrote:
> well, it doesn't work with utf-8 because it tries to underline only half the
> character. i'm aiming for the "quick fix"...

Why not at least try a kind of better fix to see how it would work?



Re: does anoybody use ul?

2015-10-23 Thread Christian Weisgerber
On 2015-10-23, "Ted Unangst"  wrote:

> ul appears somewhat useless for its intended purpose.

mandoc /usr/share/man/man1/ls.1 | ul

Works fine.  Of course that functionality has been incorporated
into more/less decades ago.

> Will anyone miss it?

Probably not, but what's the benefit of deleting it?
*shrug*

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: does anoybody use ul?

2015-10-23 Thread Nicholas Marriott
On Fri, Oct 23, 2015 at 05:11:42AM -0400, Ted Unangst wrote:
> Nicholas Marriott wrote:
> > Well, it does work:
> > 
> > printf 'A\bA_\bB'|ul
> > 
> > I still think it is not useful, I say kill it.
> 
> Oh! Is that how you use it? The man page doesn't explain, apparently expecting
> that everybody just knows there's only one true way to mark up text.

It's the same markup groff/mandoc/less/etc use:

mandoc /usr/src/usr.bin/ul/ul.1|ul

(Although I didn't remember that and did spend a while looking at the
code trying to figure it out.)

But I don't think ul is terribly useful, you can do it with "more -e".



Re: does anoybody use ul?

2015-10-23 Thread Ingo Schwarze
Hi Stuart,

Stuart Henderson wrote on Fri, Oct 23, 2015 at 11:28:35AM +0100:
> On 2015/10/23 11:52, Ingo Schwarze wrote:

>> I didn't use it so far, but now that you made me look at it, i'm
>> likely to start using it almost daily.  I often felt unhappy that
>> my gmdiff tool (for comparing groff and mandoc output) does not
>> show bold and underline fonts and i always had to pipe the gmdiff
>> output to less, even if it was short.  Calling ul(1) at the end of
>> the gmdiff script elegantly fixes that problem.

> This made me try ul(1) with gmdiff, which led me to trying it with an
> incomplete filename (I used tab-completion and didn't notice that there
> were multiple files). Found this:
> 
> $ mandoc nonexistent
> mandoc: nonexistent: ERROR: No such file or directory
> Segmentation fault 

Reported by czarkoff@ yesterday, and fixed yesterday.

Sorry for the temporary regression,
  Ingo



Re: does anoybody use ul?

2015-10-23 Thread Vadim Zhukov
2015-10-23 15:38 GMT+02:00 Ted Unangst :
> Christian Weisgerber wrote:
>> Ted Unangst:
>>
>> > --- ul.c10 Oct 2015 16:15:03 -  1.19
>> > +++ ul.c23 Oct 2015 10:29:43 -
>> > @@ -241,6 +241,8 @@ mfilter(FILE *f)
>> > obuf[col].c_mode |= BOLD|mode;
>> > else
>> > obuf[col].c_mode = mode;
>> > +   if ((c & (0x80 | 0x40)) == 0x80 && col > 0)
>> > +   obuf[col].c_mode = obuf[col - 1].c_mode;
>> > col++;
>> > if (col > maxcol)
>> > maxcol = col;
>>
>> That doesn't quite work.  Check out this:
>>
>> mandoc /usr/share/man/man1/ksh.1 | sed -n 1185,1190p | ul
>
> so that works with the diff below. i'm not sure how far down this road we need
> to travel, but i figure it's worth a little exploration.
>
> note that i don't think this handles the case of one character, backspace, a
> different character correctly, though it can asymptotically approach
> correct with some care.
>
> Index: ul.c
> ===
> RCS file: /cvs/src/usr.bin/ul/ul.c,v
> retrieving revision 1.19
> diff -u -p -r1.19 ul.c
> --- ul.c10 Oct 2015 16:15:03 -  1.19
> +++ ul.c23 Oct 2015 13:31:45 -
> @@ -151,6 +151,12 @@ main(int argc, char *argv[])
> exit(0);
>  }
>
> +int
> +isu8cont(unsigned char c)
> +{
> +   return (c & (0x80 | 0x40)) == 0x80;
> +}
> +
>  void
>  mfilter(FILE *f)
>  {
> @@ -158,8 +164,11 @@ mfilter(FILE *f)
>
> while ((c = getc(f)) != EOF && col < MAXBUF) switch(c) {
> case '\b':
> -   if (col > 0)
> +   while (col > 0) {
> col--;
> +   if (!isu8cont(obuf[col].c_char))
> +   break;

Should this check also be run in case of non-UTF-8 locale (read: "C" one)?

> +   }
> continue;
> case '\t':
> col = (col+8) & ~07;
> @@ -241,6 +250,8 @@ mfilter(FILE *f)
> obuf[col].c_mode |= BOLD|mode;
> else
> obuf[col].c_mode = mode;
> +   if ((c & (0x80 | 0x40)) == 0x80 && col > 0)
> +   obuf[col].c_mode = obuf[col - 1].c_mode;
> col++;
> if (col > maxcol)
> maxcol = col;
>

--
  WBR,
  Vadim Zhukov



Re: does anoybody use ul?

2015-10-23 Thread Christian Weisgerber
Ingo Schwarze:

>  - The FreeBSD change with wchar_t (+70 -44 lines) seems
>like overkill to me.

Wait until you've added double-width characters and correct tab
handling, which the FreeBSD code supports.

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: does anoybody use ul?

2015-10-23 Thread Ingo Schwarze
Hi Christian,

Christian Weisgerber wrote on Fri, Oct 23, 2015 at 11:26:00PM +0200:
> Ingo Schwarze:
 
>>  - The FreeBSD change with wchar_t (+70 -44 lines) seems
>>like overkill to me.

> Wait until you've added double-width characters

I tested double-width characters (both bold and underlined),
they work just fine.

> and correct tab handling, which the FreeBSD code supports.

What do you consider broken with respect to tabs?

Yours,
  Ingo



Re: does anoybody use ul?

2015-10-23 Thread Vadim Zhukov
2015-10-23 23:00 GMT+02:00 Ingo Schwarze :
> Hi Ted,
>
> Ted Unangst wrote on Fri, Oct 23, 2015 at 09:38:22AM -0400:
>
>> so that works with the diff below.
>
> I agree with the direction for this kind of tool, at least for now.
> However, your diff has a few issues, so i improved it, see below.
>
> Any OKs or vetos?
>
> Ted, in case you want to commit, the version below is obviously
> OK schwarze@.
>
>> i'm not sure how far down this road we need
>> to travel, but i figure it's worth a little exploration.
>
> I think making any valid sequence of single-codepoint characters
> work is reasonable, in particular if it just takes 15 lines
> of additional code in a utility of 500 lines.
>
>
> Changes with respect to tedu@'s version:
>
>  * chunk 151 and chunk 158: unchanged
>  * chunk 211: new chunk
>Required for the sequence underscore, backspace, multibyte character:
>Mark all the bytes underlined, not just the first one, or the
>multibyte character will be broken.
>  * chunk 237 part 1: new change
>Required such that bytes with the high bit set compare equal
>even on signed char architectures.
>  * chunk 237 part 2: style tweak
>Actually use the shiny new isu8cont() function,
>do not inline a copy of its code.
>
>
> Aspects not solved and other comments:
>
>  - The new code runs always.
>In a POSIX locale, text files are not supposed to contain bytes
>with the high bit set, so it is undefined in the first place
>what ul(1) should do.  Of course, we could artificially add yet
>more code (heavy-weight code with setlocale(3) and nl_langinfo(3),
>actually) to gratuitiously mess the file up, but i consider it
>more useful to treat UTF-8 gracefully even when the locale is
>not set, such that ul(1) output is predictable independently of
>the user's locale.

Well, we could create another helper, like 'isutf8on()', and just call
it when needed. So the actual logic won't be hurt too much... I won't
insist hard on that, though.

>  - character, backspace, different character
>This is not valid backspace encoding for bold or italic,
>so ul(1) is not supposed to handle it.  But at least, it
>no longer produces invalid UTF-8 even in that case.
>  - The FreeBSD change with wchar_t (+70 -44 lines) seems
>like overkill to me.
>  - Nothing changes with respect to tabs.
>To ul(1), tabs just mean "add enough blanks to advance to the
>next character position that is a multiple of eight".  A backspace
>will then remove the last one of them.  The usefulness of this
>feature may be argued, but that's unrelated to UTF-8.
>
>
> Index: ul.c
> ===
> RCS file: /cvs/src/usr.bin/ul/ul.c,v
> retrieving revision 1.19
> diff -u -p -r1.19 ul.c
> --- ul.c10 Oct 2015 16:15:03 -  1.19
> +++ ul.c23 Oct 2015 20:19:17 -
> @@ -151,6 +151,12 @@ main(int argc, char *argv[])
> exit(0);
>  }
>
> +int
> +isu8cont(unsigned char c)
> +{
> +   return (c & (0x80 | 0x40)) == 0x80;
> +}
> +
>  void
>  mfilter(FILE *f)
>  {
> @@ -158,8 +164,11 @@ mfilter(FILE *f)
>
> while ((c = getc(f)) != EOF && col < MAXBUF) switch(c) {
> case '\b':
> -   if (col > 0)
> +   while (col > 0) {
> col--;
> +   if (!isu8cont(obuf[col].c_char))
> +   break;
> +   }
> continue;
> case '\t':
> col = (col+8) & ~07;
> @@ -211,9 +220,13 @@ mfilter(FILE *f)
> continue;
>
> case '_':
> -   if (obuf[col].c_char)
> +   if (obuf[col].c_char != '\0') {
> obuf[col].c_mode |= UNDERL | mode;
> -   else
> +   if (obuf[col].c_char & 0x80)
> +   while (col < maxcol &
> +   isu8cont(obuf[col+1].c_char))
> +   obuf[++col].c_mode |= UNDERL | mode;
> +   } else
> obuf[col].c_char = '_';

Shouldn't the last part be something like that instead?

else {
 while (col < maxcol & isu8cont(obuf[col+1].c_char))
obuf[++col].c_mode |= UNDERL | mode;
obuf[col].c_char = '_';
}

I could be easy wrong, still trying to understand the intention behind
the original code...

> /* FALLTHROUGH */
> case ' ':
> @@ -237,10 +250,12 @@ mfilter(FILE *f)
> } else if (obuf[col].c_char == '_') {
> obuf[col].c_char = c;
> obuf[col].c_mode |= UNDERL|mode;
> -   } else if (obuf[col].c_char == c)
> +   } else if (obuf[col].c_char == (char)c)

Did you actuall wanted "(unsigned char)c" here?

> obuf[col].c_mode |= BOLD|mode;
> else
> 

Re: does anoybody use ul?

2015-10-23 Thread Ingo Schwarze
Hi Ted,

Ted Unangst wrote on Fri, Oct 23, 2015 at 09:38:22AM -0400:

> so that works with the diff below.

I agree with the direction for this kind of tool, at least for now.
However, your diff has a few issues, so i improved it, see below.

Any OKs or vetos?

Ted, in case you want to commit, the version below is obviously
OK schwarze@.

> i'm not sure how far down this road we need
> to travel, but i figure it's worth a little exploration.

I think making any valid sequence of single-codepoint characters
work is reasonable, in particular if it just takes 15 lines
of additional code in a utility of 500 lines.


Changes with respect to tedu@'s version:

 * chunk 151 and chunk 158: unchanged
 * chunk 211: new chunk
   Required for the sequence underscore, backspace, multibyte character:
   Mark all the bytes underlined, not just the first one, or the
   multibyte character will be broken.
 * chunk 237 part 1: new change
   Required such that bytes with the high bit set compare equal
   even on signed char architectures.
 * chunk 237 part 2: style tweak
   Actually use the shiny new isu8cont() function,
   do not inline a copy of its code.


Aspects not solved and other comments:

 - The new code runs always.
   In a POSIX locale, text files are not supposed to contain bytes
   with the high bit set, so it is undefined in the first place
   what ul(1) should do.  Of course, we could artificially add yet
   more code (heavy-weight code with setlocale(3) and nl_langinfo(3),
   actually) to gratuitiously mess the file up, but i consider it
   more useful to treat UTF-8 gracefully even when the locale is
   not set, such that ul(1) output is predictable independently of
   the user's locale.
 - character, backspace, different character
   This is not valid backspace encoding for bold or italic,
   so ul(1) is not supposed to handle it.  But at least, it
   no longer produces invalid UTF-8 even in that case.
 - The FreeBSD change with wchar_t (+70 -44 lines) seems
   like overkill to me.
 - Nothing changes with respect to tabs.
   To ul(1), tabs just mean "add enough blanks to advance to the
   next character position that is a multiple of eight".  A backspace
   will then remove the last one of them.  The usefulness of this
   feature may be argued, but that's unrelated to UTF-8.


Index: ul.c
===
RCS file: /cvs/src/usr.bin/ul/ul.c,v
retrieving revision 1.19
diff -u -p -r1.19 ul.c
--- ul.c10 Oct 2015 16:15:03 -  1.19
+++ ul.c23 Oct 2015 20:19:17 -
@@ -151,6 +151,12 @@ main(int argc, char *argv[])
exit(0);
 }
 
+int
+isu8cont(unsigned char c)
+{
+   return (c & (0x80 | 0x40)) == 0x80;
+}
+
 void
 mfilter(FILE *f)
 {
@@ -158,8 +164,11 @@ mfilter(FILE *f)
 
while ((c = getc(f)) != EOF && col < MAXBUF) switch(c) {
case '\b':
-   if (col > 0)
+   while (col > 0) {
col--;
+   if (!isu8cont(obuf[col].c_char))
+   break;
+   }
continue;
case '\t':
col = (col+8) & ~07;
@@ -211,9 +220,13 @@ mfilter(FILE *f)
continue;
 
case '_':
-   if (obuf[col].c_char)
+   if (obuf[col].c_char != '\0') {
obuf[col].c_mode |= UNDERL | mode;
-   else
+   if (obuf[col].c_char & 0x80)
+   while (col < maxcol &
+   isu8cont(obuf[col+1].c_char))
+   obuf[++col].c_mode |= UNDERL | mode;
+   } else
obuf[col].c_char = '_';
/* FALLTHROUGH */
case ' ':
@@ -237,10 +250,12 @@ mfilter(FILE *f)
} else if (obuf[col].c_char == '_') {
obuf[col].c_char = c;
obuf[col].c_mode |= UNDERL|mode;
-   } else if (obuf[col].c_char == c)
+   } else if (obuf[col].c_char == (char)c)
obuf[col].c_mode |= BOLD|mode;
else
obuf[col].c_mode = mode;
+   if (col > 0 && isu8cont(c))
+   obuf[col].c_mode = obuf[col - 1].c_mode;
col++;
if (col > maxcol)
maxcol = col;



does anoybody use ul?

2015-10-23 Thread Ted Unangst
ul appears somewhat useless for its intended purpose.

echo _xxx_ | ul does not result in underlined text in an xterm, so I doubt
many people are using this.

Unlike, say, mandoc, it can't output Greek letters. I also imagine most people
have moved on to some form of markdown for their other text markup needs.

Will anyone miss it?


Index: Makefile
===
RCS file: /cvs/src/usr.bin/Makefile,v
retrieving revision 1.153
diff -u -p -r1.153 Makefile
--- Makefile16 Jul 2015 20:50:40 -  1.153
+++ Makefile23 Oct 2015 07:43:25 -
@@ -25,7 +25,7 @@ SUBDIR= apply arch at aucat audioctl awk
sort spell split sqlite3 ssh stat su systat \
tail talk tcpbench tee telnet tftp tic time \
tmux top touch tput tr true tset tsort tty usbhidaction usbhidctl \
-   ul uname unexpand unifdef uniq units \
+   uname unexpand unifdef uniq units \
unvis users uudecode uuencode vacation vi vis vmstat w wall wc \
what which who whois write x99token xargs xinstall \
yacc yes