Re: sed(1): missing NUL in pattern space

Otto Moerbeek Wed, 05 Jul 2017 12:00:59 -0700

On Sat, Jul 01, 2017 at 01:20:19PM +0000, kshe wrote:

> On Tue, 27 Jun 2017 09:29:10 +0000, Otto Moerbeek wrote:
>  >
>  > at last a followup, for the original problem.
>  >
>  > This diff incorporates your later comment. It does not cause the newly
>  > added regress test to fail, though.
>  >
>  > So that poses the question if this is what you meant.
>  >
>  >     -Otto
>  >
>  > Index: process.c
>  > ===================================================================
>  > RCS file: /cvs/src/usr.bin/sed/process.c,v
>  > retrieving revision 1.32
>  > diff -u -p -r1.32 process.c
>  > --- process.c    22 Feb 2017 14:09:09 -0000    1.32
>  > +++ process.c    27 Jun 2017 09:16:33 -0000
>  > @@ -120,8 +120,10 @@ redirect:
>  >                  cp = cp->u.c;
>  >                  goto redirect;
>  >              case 'c':
>  > +                if (pd)
>  > +                    break;
>  >                  pd = 1;
>  > -                psl = 0;
>  > +                ps[psl = 0] = '\0';
>  >                  if (cp->a2 == NULL || lastaddr || lastline())
>  >                      (void)fprintf(outfile, "%s", cp->t);
>  >                  break;
>  > @@ -138,6 +140,7 @@ redirect:
>  >                  } else {
>  >                      psl -= (p + 1) - ps;
>  >                      memmove(ps, p + 1, psl);
>  > +                    ps[psl] = '\0';
>  >                      goto top;
>  >                  }
>  >              case 'g':
>  >
> 
> Indeed, there was a wording error in my second paragraph: I meant to
> align the behaviour of `l' (not `c') with that of `p', such that using
> `l' after `c' would not print anything (not even `$'), just like `p' in
> this case, which does not print anything either (not even a newline).


somebody take over this, please. I do not seem to be able to find the
time to validate this diff.

        -Otto

> 
> Index: process.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.32
> diff -up -r1.32 process.c
> --- process.c 22 Feb 2017 14:09:09 -0000      1.32
> +++ process.c 1 Jul 2017 10:24:21 -0000
> @@ -121,7 +121,7 @@ redirect:
>                               goto redirect;
>                       case 'c':
>                               pd = 1;
> -                             psl = 0;
> +                             ps[psl = 0] = '\0';
>                               if (cp->a2 == NULL || lastaddr || lastline())
>                                       (void)fprintf(outfile, "%s", cp->t);
>                               break;
> @@ -138,6 +138,7 @@ redirect:
>                               } else {
>                                       psl -= (p + 1) - ps;
>                                       memmove(ps, p + 1, psl);
> +                                     ps[psl] = '\0';
>                                       goto top;
>                               }
>                       case 'g':
> @@ -158,6 +159,8 @@ redirect:
>                               (void)fprintf(outfile, "%s", cp->t);
>                               break;
>                       case 'l':
> +                             if (pd)
> +                                     break;
>                               lputs(ps);
>                               break;
>                       case 'n':
> 
> That being said, there is more to this issue than what such a simple
> patch could fix.  For example, in a way your diff also makes sense: if
> it is believed that one ought not to use `l' nor `p' to print an
> inexistant pattern space, then surely deleting it again with `c' should
> be disallowed too.  However, as partly demonstrated by my last message,
> there is absolutely no coherence anywhere about this, so in the end one
> might wonder which command is correctly behaved and which isn't.
> 
> For instance, it is a matter of course that the sequence `N;s/.*\n//;p'
> should be just the same as `n' when normal output has not been disabled;
> but, in OpenBSD's sed, because that `pd' flag is implemented in such an
> erratic fashion, this isn't always the case:
> 
>       $ jot 6 | sed 'c\
>       > text
>       > :loop
>       > N;s/.*\n//;p
>       > bloop'
>       text
>       2
>       4
>       6
> 
>       $ jot 6 | sed 'c\
>       > text
>       > :loop
>       > n
>       > bloop'
>       text
>       2
>       3
>       4
>       5
>       6
> 
> For the same reason, nothing is printed by the following script, even if
> all lines are read:
> 
>       $ jot 6 | sed 'c\
>       > text
>       > :loop
>       > $q;N
>       > bloop'
>       text
> 
> (As a side note, amusingly, replacing the `q' above with `s/$//p' would
> actually produce the expected output, when in contrast using something
> like `s/^//p' would not...  But this is an unrelated bug.)
> 
> Likewise, one would believe the command `y/a/b/' to be functionally
> equivalent to `s/a/b/g'; but, again, many examples of the opposite
> behaviour can be constructed; for instance:
> 
>       $ echo | sed 'c\
>       > text
>       > s/^/a/;s/^//
>       > y/a/b/
>       > s/^//p;d'
>       text
>       a
> 
>       $ echo | sed 'c\
>       > text
>       > s/^/a/;s/^//
>       > s/a/b/g
>       > s/^//p;d'
>       text
>       b
> 
> Because, of course, thanks to that `pd' magic, `y' is only effective
> after `c' when an odd number of substitutions have affected the pattern
> space.  Although it is true that the above two bugs are primarily due to
> the incorrect handling of the `deleted' flag by the `s' command, they
> nonetheless help to expose further the inconsistency of the code, making
> its overall fragility even more obvious than it would have otherwise
> been; in this case in particular, they allow one to notice that:
> 
> 1.  Even though `n' resets the `deleted' flag, `N' does not.
> 2.  Even though `s' ignores the `deleted' flag, `y' does not.
> 
> And so, similarily, even if `p', `P', and `w' check for it, `l' ignores
> `pd' completely; and, indeed, so does `c', in contrast with other
> commands such as `D'.
> 
> I hope that these examples make it clear that the problem is much worse
> than what the above diff actually solves.  Of course, I could have
> written a third detailed report appropriately addressing all of this,
> but I figured no reasonable amount of patching endeavour could ever
> truly and satisfyingly improve the overall condition of such inherently
> flawed code.  Therefore, I no longer believe any of my patches (nor any
> other patches, really) to be relevant to OpenBSD's current sed.
> 
> This might sound a little extreme, but, in fact, as my previous messages
> illustrate, that `pd' madness is far from being the only incongruity
> lurking within sed's source.  I should also add that, since then, I
> actually found more bugs, some of them even affecting the parsing
> algorithm...  To take only one example out of many, the implementation
> of the `y' command is incorrect:
> 
>       $ sed 'y/[/]/'
>       sed: 1: "y/[/]/": unterminated transform target string
> 
> (The parser thinks that an opening bracket in an argument to the `y'
> command is meant to be the start of a character class, as if it were
> part of a regular expression.)
> 
> Considering that sed is my favourite Unix utility (dc comes close; vi
> and sh don't count as they are more than mere utilities), I would be
> very sad if all my beautiful sed scripts were forever to be handled by
> such a mediocre implementation, since I don't really plan to ever use
> anything else than OpenBSD.  Also, in light of all these observations,
> instead of devising endless diffs for a piece of software that clearly
> doesn't deserve any, I am willing to rewrite it from scratch.  This
> might take a few weeks as I do not have enough time to work on it
> regularly, but there is apparently no hurry: after all, I seem to be the
> only one out there to actually need such features as empty match
> replacement to work reliably, no matter if the last command was `D' or
> if there happens to be an embed NUL or newline in the pattern space.
> 
> Besides, and perhaps even more importantly, a clean rewrite could yield
> many other improvements (like a more restrictive pledge(2) depending on
> the outcome of the scripts' compilation), without the hassle of having
> to untangle such old and messy code.  As a consequence, especially since
> no one at OpenBSD appears to have time to deal with all the deficiencies
> of the existing source, it might be simpler to wait until I am done with
> my own, as my code shall be much more readable than the current one, and
> therefore quite easy to review.
> 
> Kind regards,
> 
> kshe

Re: sed(1): missing NUL in pattern space

Reply via email to