Hi,

Michal Mazurek wrote on Mon, Jan 04, 2016 at 08:27:50PM +0100:

>  Fold line after the last blank character within the first
>  .Ar width
>  column positions (or bytes).
> +If a blank character does not exist within the width, then
> +a longer line will still be split at the width.

> The first diff explains what happens to a word longer than 'width'.
> This diff comes from NetBSD and FreeBSD.  The explanation is quite
> complicated and I couldn't understand what this flag does at first, the
> source code makes it much clearer (the variable is called split_words).
> But I didn't change the wording, maybe it's just me.

I hate the wording, too.  It is misleading because splitting at the
width also happens without the -s flag, but this addition makes it
sound as if that were specific to -s.

> -.It Fl w Ar width
> +.It Fl w Ar width | Fl Ns Ar width

> The second diff comes from me, and documents a different way to
> specify width.

That's an old, obsolete, non-POSIX syntax variant that is only
supported for backward compatibility and intentionally undocumented
to discourage its use.

> -breaking the lines to have a maximum of 80 characters.
> +breaking the lines to have a maximum of 80 columns.

> The third diff comes from FreeBSD, and makes wording a bit more precise.

Yes, that makes sense.  In other places, we have started using the
term "display columns", so maybe we should use that here, too,
at least when it first occurs.  For -w, it's better to just remove
the word "characters", because .Ar width is display columns or
bytes depending on whether or not -s is specified.

Besides, for now, we should note that multibyte character support
is missing.  fold(1) is on the list of programs to fix.

So, here is what i would like to commit for now.

OK?
  Ingo


P.S.
The manual should also explain how tab, backspace, and carriage
return are handled, but that can be done when adding UTF-8 support.

P.P.S.
This requirement in POSIX is a mess:

  Although terminal input in canonical processing mode requires the
  erase character (frequently set to <backspace>) to erase the
  previous character (not byte or column position), terminal output
  is not buffered and is extremely difficult, if not impossible,
  to parse correctly; the interpretation depends entirely on the
  physical device that actually displays/prints/stores the output.
  In all known internationalized implementations, the utilities
  producing output for mixed column-width output assume that a
  <backspace> character backs up one column position and outputs
  enough <backspace> characters to return to the start of the
  character when <backspace> is used to provide local line motions
  to support underlining and emboldening operations. Since fold
  without the -b option is dealing with these same constraints,
  <backspace> is always treated as backing up one column position
  rather than backing up one character.

That's completely bogus, it is hard to imagine that the POSIX
crowd never heard about roff:

   $ echo 'x\\fB\\[uCFFF]\\fRx' | nroff -c | hexdump -C
  00000000  78 ec bf bf 08 ec bf bf  78 0a  |x.......x.|

Grrr.


Index: fold.1
===================================================================
RCS file: /cvs/src/usr.bin/fold/fold.1,v
retrieving revision 1.16
diff -u -p -r1.16 fold.1
--- fold.1      28 Dec 2011 22:27:18 -0000      1.16
+++ fold.1      5 Jan 2016 00:14:44 -0000
@@ -45,7 +45,7 @@
 .Nm
 is a filter which folds the contents of the specified files,
 or the standard input if no files are specified,
-breaking the lines to have a maximum of 80 characters.
+breaking the lines to have a maximum of 80 display columns.
 .Pp
 The options are as follows:
 .Bl -tag -width Ds
@@ -54,11 +54,13 @@ Count
 .Ar width
 in bytes rather than column positions.
 .It Fl s
-Fold line after the last blank character within the first
-.Ar width
-column positions (or bytes).
+If an output line would be broken after a non-blank character but
+contains at least one blank character, break the line earlier,
+after the last blank character.
+This is useful to avoid line breaks in the middle of words, if
+possible.
 .It Fl w Ar width
-Specifies a line width to use instead of the default 80 characters.
+Specifies a line width to use instead of the default of 80.
 .El
 .Sh EXIT STATUS
 .Ex -std fold
@@ -108,3 +110,5 @@ be expanded using
 .Xr expand 1
 before using
 .Nm fold .
+.Pp
+Multibyte character support is missing.

Reply via email to