On Sat, Jul 01, 2017 at 01:20:19PM +0000, kshe wrote: > On Tue, 27 Jun 2017 09:29:10 +0000, Otto Moerbeek wrote: > > > > at last a followup, for the original problem. > > > > This diff incorporates your later comment. It does not cause the newly > > added regress test to fail, though. > > > > So that poses the question if this is what you meant. > > > > -Otto > > > > Index: process.c > > =================================================================== > > RCS file: /cvs/src/usr.bin/sed/process.c,v > > retrieving revision 1.32 > > diff -u -p -r1.32 process.c > > --- process.c 22 Feb 2017 14:09:09 -0000 1.32 > > +++ process.c 27 Jun 2017 09:16:33 -0000 > > @@ -120,8 +120,10 @@ redirect: > > cp = cp->u.c; > > goto redirect; > > case 'c': > > + if (pd) > > + break; > > pd = 1; > > - psl = 0; > > + ps[psl = 0] = '\0'; > > if (cp->a2 == NULL || lastaddr || lastline()) > > (void)fprintf(outfile, "%s", cp->t); > > break; > > @@ -138,6 +140,7 @@ redirect: > > } else { > > psl -= (p + 1) - ps; > > memmove(ps, p + 1, psl); > > + ps[psl] = '\0'; > > goto top; > > } > > case 'g': > > > > Indeed, there was a wording error in my second paragraph: I meant to > align the behaviour of `l' (not `c') with that of `p', such that using > `l' after `c' would not print anything (not even `$'), just like `p' in > this case, which does not print anything either (not even a newline).
somebody take over this, please. I do not seem to be able to find the time to validate this diff. -Otto > > Index: process.c > =================================================================== > RCS file: /cvs/src/usr.bin/sed/process.c,v > retrieving revision 1.32 > diff -up -r1.32 process.c > --- process.c 22 Feb 2017 14:09:09 -0000 1.32 > +++ process.c 1 Jul 2017 10:24:21 -0000 > @@ -121,7 +121,7 @@ redirect: > goto redirect; > case 'c': > pd = 1; > - psl = 0; > + ps[psl = 0] = '\0'; > if (cp->a2 == NULL || lastaddr || lastline()) > (void)fprintf(outfile, "%s", cp->t); > break; > @@ -138,6 +138,7 @@ redirect: > } else { > psl -= (p + 1) - ps; > memmove(ps, p + 1, psl); > + ps[psl] = '\0'; > goto top; > } > case 'g': > @@ -158,6 +159,8 @@ redirect: > (void)fprintf(outfile, "%s", cp->t); > break; > case 'l': > + if (pd) > + break; > lputs(ps); > break; > case 'n': > > That being said, there is more to this issue than what such a simple > patch could fix. For example, in a way your diff also makes sense: if > it is believed that one ought not to use `l' nor `p' to print an > inexistant pattern space, then surely deleting it again with `c' should > be disallowed too. However, as partly demonstrated by my last message, > there is absolutely no coherence anywhere about this, so in the end one > might wonder which command is correctly behaved and which isn't. > > For instance, it is a matter of course that the sequence `N;s/.*\n//;p' > should be just the same as `n' when normal output has not been disabled; > but, in OpenBSD's sed, because that `pd' flag is implemented in such an > erratic fashion, this isn't always the case: > > $ jot 6 | sed 'c\ > > text > > :loop > > N;s/.*\n//;p > > bloop' > text > 2 > 4 > 6 > > $ jot 6 | sed 'c\ > > text > > :loop > > n > > bloop' > text > 2 > 3 > 4 > 5 > 6 > > For the same reason, nothing is printed by the following script, even if > all lines are read: > > $ jot 6 | sed 'c\ > > text > > :loop > > $q;N > > bloop' > text > > (As a side note, amusingly, replacing the `q' above with `s/$//p' would > actually produce the expected output, when in contrast using something > like `s/^//p' would not... But this is an unrelated bug.) > > Likewise, one would believe the command `y/a/b/' to be functionally > equivalent to `s/a/b/g'; but, again, many examples of the opposite > behaviour can be constructed; for instance: > > $ echo | sed 'c\ > > text > > s/^/a/;s/^// > > y/a/b/ > > s/^//p;d' > text > a > > $ echo | sed 'c\ > > text > > s/^/a/;s/^// > > s/a/b/g > > s/^//p;d' > text > b > > Because, of course, thanks to that `pd' magic, `y' is only effective > after `c' when an odd number of substitutions have affected the pattern > space. Although it is true that the above two bugs are primarily due to > the incorrect handling of the `deleted' flag by the `s' command, they > nonetheless help to expose further the inconsistency of the code, making > its overall fragility even more obvious than it would have otherwise > been; in this case in particular, they allow one to notice that: > > 1. Even though `n' resets the `deleted' flag, `N' does not. > 2. Even though `s' ignores the `deleted' flag, `y' does not. > > And so, similarily, even if `p', `P', and `w' check for it, `l' ignores > `pd' completely; and, indeed, so does `c', in contrast with other > commands such as `D'. > > I hope that these examples make it clear that the problem is much worse > than what the above diff actually solves. Of course, I could have > written a third detailed report appropriately addressing all of this, > but I figured no reasonable amount of patching endeavour could ever > truly and satisfyingly improve the overall condition of such inherently > flawed code. Therefore, I no longer believe any of my patches (nor any > other patches, really) to be relevant to OpenBSD's current sed. > > This might sound a little extreme, but, in fact, as my previous messages > illustrate, that `pd' madness is far from being the only incongruity > lurking within sed's source. I should also add that, since then, I > actually found more bugs, some of them even affecting the parsing > algorithm... To take only one example out of many, the implementation > of the `y' command is incorrect: > > $ sed 'y/[/]/' > sed: 1: "y/[/]/": unterminated transform target string > > (The parser thinks that an opening bracket in an argument to the `y' > command is meant to be the start of a character class, as if it were > part of a regular expression.) > > Considering that sed is my favourite Unix utility (dc comes close; vi > and sh don't count as they are more than mere utilities), I would be > very sad if all my beautiful sed scripts were forever to be handled by > such a mediocre implementation, since I don't really plan to ever use > anything else than OpenBSD. Also, in light of all these observations, > instead of devising endless diffs for a piece of software that clearly > doesn't deserve any, I am willing to rewrite it from scratch. This > might take a few weeks as I do not have enough time to work on it > regularly, but there is apparently no hurry: after all, I seem to be the > only one out there to actually need such features as empty match > replacement to work reliably, no matter if the last command was `D' or > if there happens to be an embed NUL or newline in the pattern space. > > Besides, and perhaps even more importantly, a clean rewrite could yield > many other improvements (like a more restrictive pledge(2) depending on > the outcome of the scripts' compilation), without the hassle of having > to untangle such old and messy code. As a consequence, especially since > no one at OpenBSD appears to have time to deal with all the deficiencies > of the existing source, it might be simpler to wait until I am done with > my own, as my code shall be much more readable than the current one, and > therefore quite easy to review. > > Kind regards, > > kshe