Hi Jason and Theo,

Jason McIntyre wrote on Tue, Jun 22, 2021 at 06:37:27AM +0100:
> On Tue, Jun 22, 2021 at 04:48:39AM +0200, Theo Buehler wrote:

>> You have two overlong lines as indicated below. I would have thought
>> that mandoc -Tlint complains about that, but apparently it doesn't have
>> such a warning... With those wrapped,
 
> yes, there is no feedback on long lines. although we try to keep the
> source less than 80 width, there are some places where it is not
> possible.
> 
> i'm not sure whether adding a warning would be helpful or disruptive.

Here is a patch implementing such a style warning, leaning very
heavily into the direction of never producing false positives, that
is, not warning about long lines

 - in no-fill mode (e.g., .Bd -literal, .EX, .nf and the like)
 - that start with a dot (normal macro and request lines)
   or with a non-standard control character
 - that start with a space character or with an escape sequence
 - in tbl(7) context
 - in eqn(7) context
 - that do not contain a blank character before column 80

So, this certainly does not find all long lines that can be improved,
but everything it finds ought to be trivial and worthwhile to fix.

There are less than twenty-five offenders below /usr/src/share/man/,
and none of those are false positives.

However, i see over twenty-three thousand offending lines
below /usr/share/man/, the vast majority in Perl manual pages
and considerable amounts in other third-party stuff like GCC
and binutils.
The worst offenders we maintain ourselves are tmux(1) with 15
offending lines, terminfo(5) with 11, mkhybrid(8) with 6, tic(1),
magic(5), sysctl(2), cdio(1), and the rest with three or less
offending lines each, maybe a few dozen offending files all told.

I don't think such a patch would be disruptive.  I expect most of the
third-party manuals it flags already have many other warnings.
Our own stuff does not have large amounts of issues due to Jason's
diligent manual work.

Does it help?  Maybe it could slightly reduce one aspect of Jason's
workload and help developers who want to find their own glitches in
this respect, in particular those usually working on terminals wider
than 80 columns.

There is no need to chase all of these down, but the style warning
might help when editing a page for other reasons.

What do you think?
  Ingo


P.S.
The following script makes it easy to count violations:

mandoc -T lint */*.[1-9]* */*/*.[1-9]* | \
  grep 'longer than' | \
  cut -d : -f 2 | \
  uniq -c | \
  sort -nr


Index: libmandoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/libmandoc.h,v
retrieving revision 1.64
diff -u -p -r1.64 libmandoc.h
--- libmandoc.h 3 Apr 2020 11:34:19 -0000       1.64
+++ libmandoc.h 26 Jun 2021 16:19:03 -0000
@@ -73,7 +73,7 @@ void           roff_reset(struct roff *);
 void            roff_man_free(struct roff_man *);
 struct roff_man        *roff_man_alloc(struct roff *, const char *, int);
 void            roff_man_reset(struct roff_man *);
-int             roff_parseln(struct roff *, int, struct buf *, int *);
+int             roff_parseln(struct roff *, int, struct buf *, int *, size_t);
 void            roff_userret(struct roff *);
 void            roff_endparse(struct roff *);
 void            roff_setreg(struct roff *, const char *, int, char);
Index: mandoc.1
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mandoc.1,v
retrieving revision 1.175
diff -u -p -r1.175 mandoc.1
--- mandoc.1    2 Jun 2021 18:27:36 -0000       1.175
+++ mandoc.1    26 Jun 2021 16:19:04 -0000
@@ -1066,6 +1066,9 @@ An
 request occurs even though the document already switched to no-fill mode
 and did not switch back to fill mode yet.
 It has no effect.
+.It Sy "input text line longer than 80 bytes"
+Consider breaking the input text line
+at one of the blank characters before column 80.
 .It Sy "verbatim \(dq--\(dq, maybe consider using \e(em"
 .Pq mdoc
 Even though the ASCII output device renders an em-dash as
Index: mandoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mandoc.h,v
retrieving revision 1.212
diff -u -p -r1.212 mandoc.h
--- mandoc.h    2 Jun 2021 18:27:37 -0000       1.212
+++ mandoc.h    26 Jun 2021 16:19:04 -0000
@@ -72,6 +72,7 @@ enum  mandocerr {
        MANDOCERR_DELIM_NB, /* no blank before trailing delimiter: macro ... */
        MANDOCERR_FI_SKIP, /* fill mode already enabled, skipping: fi */
        MANDOCERR_NF_SKIP, /* fill mode already disabled, skipping: nf */
+       MANDOCERR_TEXT_LONG, /* input text line longer than 80 bytes */
        MANDOCERR_DASHDASH, /* verbatim "--", maybe consider using \(em */
        MANDOCERR_FUNC, /* function name without markup: name() */
        MANDOCERR_SPACE_EOL, /* whitespace at end of input line */
Index: mandoc_msg.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mandoc_msg.c,v
retrieving revision 1.11
diff -u -p -r1.11 mandoc_msg.c
--- mandoc_msg.c        2 Jun 2021 18:27:37 -0000       1.11
+++ mandoc_msg.c        26 Jun 2021 16:19:04 -0000
@@ -71,6 +71,7 @@ static        const char *const type_message[MA
        "no blank before trailing delimiter",
        "fill mode already enabled, skipping",
        "fill mode already disabled, skipping",
+       "input text line longer than 80 bytes",
        "verbatim \"--\", maybe consider using \\(em",
        "function name without markup",
        "whitespace at end of input line",
Index: read.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/read.c,v
retrieving revision 1.190
diff -u -p -r1.190 read.c
--- read.c      24 Apr 2020 11:58:02 -0000      1.190
+++ read.c      26 Jun 2021 16:19:04 -0000
@@ -152,6 +152,7 @@ mparse_buf_r(struct mparse *curp, struct
        struct buf      *firstln, *lastln, *thisln, *loop;
        char            *cp;
        size_t           pos; /* byte number in the ln buffer */
+       size_t           spos; /* at the start of the current line parse */
        int              line_result, result;
        int              of;
        int              lnn; /* line number in the real file */
@@ -178,6 +179,7 @@ mparse_buf_r(struct mparse *curp, struct
                            curp->filenc & MPARSE_LATIN1)
                                curp->filenc = preconv_cue(&blk, i);
                }
+               spos = pos;
 
                while (i < blk.sz && (start || blk.buf[i] != '\0')) {
 
@@ -277,7 +279,8 @@ mparse_buf_r(struct mparse *curp, struct
 
                of = 0;
 rerun:
-               line_result = roff_parseln(curp->roff, curp->line, &ln, &of);
+               line_result = roff_parseln(curp->roff, curp->line,
+                   &ln, &of, start && spos == 0 ? pos : 0);
 
                /* Process options. */
 
Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.248
diff -u -p -r1.248 roff.c
--- roff.c      27 Aug 2020 12:58:00 -0000      1.248
+++ roff.c      26 Jun 2021 16:19:04 -0000
@@ -1821,7 +1821,7 @@ roff_parsetext(struct roff *r, struct bu
 }
 
 int
-roff_parseln(struct roff *r, int ln, struct buf *buf, int *offs)
+roff_parseln(struct roff *r, int ln, struct buf *buf, int *offs, size_t len)
 {
        enum roff_tok    t;
        int              e;
@@ -1831,6 +1831,14 @@ roff_parseln(struct roff *r, int ln, str
        int              ctl;   /* macro line (boolean) */
 
        ppos = pos = *offs;
+
+       if (len > 80 && r->tbl == NULL && r->eqn == NULL &&
+           (r->man->flags & ROFF_NOFILL) == 0 &&
+           strchr(" .\\", buf->buf[pos]) == NULL &&
+           buf->buf[pos] != r->control &&
+           strcspn(buf->buf, " ") < 80)
+               mandoc_msg(MANDOCERR_TEXT_LONG, ln, (int)len - 1,
+                   "%.20s...", buf->buf + pos);
 
        /* Handle in-line equation delimiters. */
 

Reply via email to