Re: man sed(1) diff

2021-09-05 Thread Andreas Kusalananda Kähäri
On Sun, Sep 05, 2021 at 02:01:37PM +0200, Ingo Schwarze wrote:
> Hello,
> 
> i see where Ian's confusion is coming from, even though arguably,
> the existing text is accurate.  But it is not a good idea to insert
> exceptions as parenthetic remarks in the middle of an enumeration
> of steps that is already somewhat long and complicated.
> 
> I think it is better to explain the special rules for D processing
> in the paragraph describing D (surprise, surprise) rather than in
> the middle of the general description of what a "cycle" is.
> 
> 
> Consider the following minimal example:
> 
>$ printf 'axb' | sed -n 'y/x/\n/;s/^b/c/;P;D'
>a
>c
> 
> The D deletes "a\n" from the pattern space.
> After that, the next cycle is entered with "b" in the pattern space,
> without copying a new line into the pattern space.
> 
> 
> OK?
>   Ingo
> 
> 
> Index: sed.1
> ===
> RCS file: /cvs/src/usr.bin/sed/sed.1,v
> retrieving revision 1.48
> diff -u -r1.48 sed.1
> --- sed.1 17 Mar 2016 05:27:10 -  1.48
> +++ sed.1 5 Sep 2021 11:46:48 -
> @@ -140,9 +140,6 @@
>  cyclically copies a line of input, not including its terminating newline
>  character, into a
>  .Em pattern space ,
> -(unless there is something left after a
> -.Ic D
> -function),
>  applies all of the commands with addresses that select that pattern space,
>  copies the pattern space to the standard output, appending a newline, and
>  deletes the pattern space.
> @@ -331,7 +328,8 @@
>  Delete the pattern space and start the next cycle.
>  .It [2addr] Ns Ic D
>  Delete the initial segment of the pattern space through the first
> -newline character and start the next cycle.
> +newline character and start the next cycle without copying the next
> +line of input into the pattern space.
>  .It [2addr] Ns Ic g
>  Replace the contents of the pattern space with the contents of the
>  hold space.


This is an improvement.  This should be compared to the POSIX text:

If the pattern space contains no , delete the pattern
space and start a normal new cycle as if the d command was
issued.  Otherwise, delete the initial segment of the pattern
space through the first , and start the next cycle with
the resultant pattern space and without reading any new input.

The difference is that the POSIX text explicitly says that "D" should
act like "d" if there is no newline character in the pattern space.  The
OpenBSD text does not actually say what happens if there is no newline
character in the pattern space (I believe that OpenBSD sed behaves as
POSIX describes).

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden

.



Re: man sed(1) diff

2021-09-05 Thread Andreas Kusalananda Kähäri
On Sun, Sep 05, 2021 at 09:51:44AM +0100, ropers wrote:
> I.
> 
> Not to engage in pointless bikeshedding, but I find this clearer and
> --if I understand things correctly-- also more technically accurate:
> 
> Index: sed.1
> ===
> RCS file: /cvs/src/usr.bin/sed/sed.1,v
> retrieving revision 1.60
> diff -C8 -u -r1.60 sed.1
> cvs server: conflicting specifications of output style
> --- sed.1 8 Mar 2021 02:47:28 -   1.60
> +++ sed.1 5 Sep 2021 08:23:25 -
> @@ -141,19 +141,19 @@
>  Normally,
>  .Nm
>  cyclically copies a line of input, not including its terminating newline
>  character, into a
>  .Em pattern space ,
>  (unless there is something left after a
>  .Ic D
>  function),
> -applies all of the commands with addresses that select that pattern space,
> -copies the pattern space to the standard output, appending a newline, and
> -deletes the pattern space.
> +applies all of the commands with addresses that select that pattern,

An address does not select a pattern, it selects the pattern space.

> +copies the pattern space contents to the standard output, appending a

The word "contents" is not needed.

> +newline, and deletes them from the pattern space.

It's unclear what "them" refers to here (possibly "the contents"?).

The proposed text seems to want to separate "the pattern space" from
"the contents of the pattern space".  I don't think this distinction
is helpful, just like it's not helpful ta make a distinction between
"a string" and "the contents of a string".  The manual (and the POSIX
standard text) refers to the data that is read from the input and
currently subject to the commands of the editing script as "the pattern
space" throughout, not just in this introductory paragraph.


>  .Pp
>  Some of the functions use a
>  .Em hold space
>  to save all or part of the pattern space for subsequent retrieval.
>  .Sh SED ADDRESSES
>  An address is not required, but if specified must be a number (that counts
>  input lines
>  cumulatively across input files), a dollar character
> 
> (I used the diff -C 8 option to show a little more context.)
> 
> 
> II.
> 
> [Link for easier reading: ]
> 
> Also, does the "(unless there is something left after a D function)"
> part really relate to the preceding parenthetical clause of "not
> including its terminating newline character"?  Should it be moved to
> directly follow that instead of following the "into a pattern space"
> part?
> Alternatively, would it be better to move the "(...)" part to a
> separate subsequent sentence like this:
> > (A newline character may be present in the pattern space
> > if left behind after a
> > .Ic D
> > function.)
> Is it even important to include that information in the man page?  Is
> it ever relevant that there may technically be some string and a
> newline left in the pattern space?
> 
> 
> Thank you,
> Ian

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden

.



Re: man sed(1) diff

2021-09-05 Thread Ingo Schwarze
Hello,

i see where Ian's confusion is coming from, even though arguably,
the existing text is accurate.  But it is not a good idea to insert
exceptions as parenthetic remarks in the middle of an enumeration
of steps that is already somewhat long and complicated.

I think it is better to explain the special rules for D processing
in the paragraph describing D (surprise, surprise) rather than in
the middle of the general description of what a "cycle" is.


Consider the following minimal example:

   $ printf 'axb' | sed -n 'y/x/\n/;s/^b/c/;P;D'
   a
   c

The D deletes "a\n" from the pattern space.
After that, the next cycle is entered with "b" in the pattern space,
without copying a new line into the pattern space.


OK?
  Ingo


Index: sed.1
===
RCS file: /cvs/src/usr.bin/sed/sed.1,v
retrieving revision 1.48
diff -u -r1.48 sed.1
--- sed.1   17 Mar 2016 05:27:10 -  1.48
+++ sed.1   5 Sep 2021 11:46:48 -
@@ -140,9 +140,6 @@
 cyclically copies a line of input, not including its terminating newline
 character, into a
 .Em pattern space ,
-(unless there is something left after a
-.Ic D
-function),
 applies all of the commands with addresses that select that pattern space,
 copies the pattern space to the standard output, appending a newline, and
 deletes the pattern space.
@@ -331,7 +328,8 @@
 Delete the pattern space and start the next cycle.
 .It [2addr] Ns Ic D
 Delete the initial segment of the pattern space through the first
-newline character and start the next cycle.
+newline character and start the next cycle without copying the next
+line of input into the pattern space.
 .It [2addr] Ns Ic g
 Replace the contents of the pattern space with the contents of the
 hold space.



man sed(1) diff

2021-09-05 Thread ropers
I.

Not to engage in pointless bikeshedding, but I find this clearer and
--if I understand things correctly-- also more technically accurate:

Index: sed.1
===
RCS file: /cvs/src/usr.bin/sed/sed.1,v
retrieving revision 1.60
diff -C8 -u -r1.60 sed.1
cvs server: conflicting specifications of output style
--- sed.1   8 Mar 2021 02:47:28 -   1.60
+++ sed.1   5 Sep 2021 08:23:25 -
@@ -141,19 +141,19 @@
 Normally,
 .Nm
 cyclically copies a line of input, not including its terminating newline
 character, into a
 .Em pattern space ,
 (unless there is something left after a
 .Ic D
 function),
-applies all of the commands with addresses that select that pattern space,
-copies the pattern space to the standard output, appending a newline, and
-deletes the pattern space.
+applies all of the commands with addresses that select that pattern,
+copies the pattern space contents to the standard output, appending a
+newline, and deletes them from the pattern space.
 .Pp
 Some of the functions use a
 .Em hold space
 to save all or part of the pattern space for subsequent retrieval.
 .Sh SED ADDRESSES
 An address is not required, but if specified must be a number (that counts
 input lines
 cumulatively across input files), a dollar character

(I used the diff -C 8 option to show a little more context.)


II.

[Link for easier reading: ]

Also, does the "(unless there is something left after a D function)"
part really relate to the preceding parenthetical clause of "not
including its terminating newline character"?  Should it be moved to
directly follow that instead of following the "into a pattern space"
part?
Alternatively, would it be better to move the "(...)" part to a
separate subsequent sentence like this:
> (A newline character may be present in the pattern space
> if left behind after a
> .Ic D
> function.)
Is it even important to include that information in the man page?  Is
it ever relevant that there may technically be some string and a
newline left in the pattern space?


Thank you,
Ian