Re: sed(1): line addresses and next cycle
On Mon, 2021-08-16 at 08:25 -0600, Todd C. Miller wrote: > On Sun, 15 Aug 2021 23:56:54 +0200, Andreas Kusalananda > =?utf-8?B?S8OkaMOkcmk=? > = wrote: > > > But note that this comes out of a discussion on how to do '0,/re/' > > addressing with OpenBSD sed. Your changes appears to remove one way of > > actually handling a match of '/re/' on the first line without giving us > > another. It would be better to have a clean way of doing the equivalent > > of '0,/re/' than to remove a way to do this. Interestingly (?), the sed > > in plan9port works the same as our native sed. > > As does AT&T sed on Solaris. Since our sed currently matches the > behavior of the "canonical" sed, I don't see a reason to change it. > > - todd Ack. Like I said, just putting some feelers out. I'll drop the idea.
Re: sed(1): line addresses and next cycle
On Sun, 15 Aug 2021 23:56:54 +0200, Andreas Kusalananda =?utf-8?B?S8OkaMOkcmk=? = wrote: > But note that this comes out of a discussion on how to do '0,/re/' > addressing with OpenBSD sed. Your changes appears to remove one way of > actually handling a match of '/re/' on the first line without giving us > another. It would be better to have a clean way of doing the equivalent > of '0,/re/' than to remove a way to do this. Interestingly (?), the sed > in plan9port works the same as our native sed. As does AT&T sed on Solaris. Since our sed currently matches the behavior of the "canonical" sed, I don't see a reason to change it. - todd
Re: sed(1): line addresses and next cycle
On Sun, Aug 15, 2021 at 11:34:22PM +0200, Martijn van Duren wrote: > Andreas Kähäri gave a nice example on misc@ on how our sed addressing > implemenation differs from gsed[0][1]. While writing my reply I noticed > that POSIX doesn't state how "next cycle" should be interpreted when it > comes to address ranges. So I can't state that our implementation is > wrong per se. However, I do think that gsed's interpretation is more > intuitive, since a numeric address is not dependent on the context of > the pattern space and thus should register as "in range". But note that this comes out of a discussion on how to do '0,/re/' addressing with OpenBSD sed. Your changes appears to remove one way of actually handling a match of '/re/' on the first line without giving us another. It would be better to have a clean way of doing the equivalent of '0,/re/' than to remove a way to do this. Interestingly (?), the sed in plan9port works the same as our native sed. Andreas > > Diff below changes program parsing to more closely match gsed in this > regard: > $ printf 'test1\nbla1\ntest2\nbla2\n' | sed -e '1 { /^test/d; }' -e > '1,/^test/d' > bla1 > test2 > bla2 > $ printf 'test1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e > '1,/^test/d' > bla2 > $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' > -e '1,/^test/d' > bla1 > test2 > bla2 > $ printf 'test1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e > '1,/^test/d' > bla2 > $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e > '1,/^test/d' > bla1 > test2 > bla2 > > The diff passes regress, but hasn't had a lot of scrutiny. Just checking > for general interest in changing this functionality. As soon as I > know that it's something we might want I'll spend more braincycles on > it. > > martijn@ > > [0] https://marc.info/?l=openbsd-misc&m=162896537001890&w=2 > [1] https://marc.info/?l=openbsd-misc&m=162905748428954&w=2 > > Index: process.c > === > RCS file: /cvs/src/usr.bin/sed/process.c,v > retrieving revision 1.34 > diff -u -p -r1.34 process.c > --- process.c 14 Nov 2018 10:59:33 - 1.34 > +++ process.c 15 Aug 2021 21:30:22 - > @@ -89,14 +89,16 @@ process(void) > SPACE tspace; > size_t len, oldpsl; > char *p; > + int nextcycle; > > for (linenum = 0; mf_fgets(&PS, REPLACE);) { > pd = 0; > + nextcycle = 0; > top: > cp = prog; > redirect: > while (cp != NULL) { > - if (!applies(cp)) { > + if (!applies(cp) || nextcycle) { > cp = cp->next; > continue; > } > @@ -127,14 +129,16 @@ redirect: > break; > case 'd': > pd = 1; > - goto new; > + nextcycle = 1; > + break; > case 'D': > if (pd) > goto new; > if (psl == 0 || > (p = memchr(ps, '\n', psl)) == NULL) { > pd = 1; > - goto new; > + nextcycle = 1; > + break; > } else { > psl -= (p + 1) - ps; > memmove(ps, p + 1, psl); > @@ -267,8 +271,9 @@ new: if (!nflag && !pd) > * (lastline, linenumber, ps). > */ > #define MATCH(a)\ > - (a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, 0, psl) :\ > - (a)->type == AT_LINE ? linenum == (a)->u.l : lastline() > + (a)->type == AT_LINE ? linenum == (a)->u.l :\ > + (a)->type == AT_LAST ? lastline() : \ > + pd ? 0 : regexec_e((a)->u.r, ps, 0, 1, 0, psl) > > /* > * Return TRUE if the command applies to the current line. Sets the inrange > -- Andreas (Kusalananda) Kähäri SciLifeLab, NBIS, ICM Uppsala University, Sweden .
sed(1): line addresses and next cycle
Andreas Kähäri gave a nice example on misc@ on how our sed addressing implemenation differs from gsed[0][1]. While writing my reply I noticed that POSIX doesn't state how "next cycle" should be interpreted when it comes to address ranges. So I can't state that our implementation is wrong per se. However, I do think that gsed's interpretation is more intuitive, since a numeric address is not dependent on the context of the pattern space and thus should register as "in range". Diff below changes program parsing to more closely match gsed in this regard: $ printf 'test1\nbla1\ntest2\nbla2\n' | sed -e '1 { /^test/d; }' -e '1,/^test/d' bla1 test2 bla2 $ printf 'test1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e '1,/^test/d' bla2 $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e '1,/^test/d' bla1 test2 bla2 $ printf 'test1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e '1,/^test/d' bla2 $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e '1,/^test/d' bla1 test2 bla2 The diff passes regress, but hasn't had a lot of scrutiny. Just checking for general interest in changing this functionality. As soon as I know that it's something we might want I'll spend more braincycles on it. martijn@ [0] https://marc.info/?l=openbsd-misc&m=162896537001890&w=2 [1] https://marc.info/?l=openbsd-misc&m=162905748428954&w=2 Index: process.c === RCS file: /cvs/src/usr.bin/sed/process.c,v retrieving revision 1.34 diff -u -p -r1.34 process.c --- process.c 14 Nov 2018 10:59:33 - 1.34 +++ process.c 15 Aug 2021 21:30:22 - @@ -89,14 +89,16 @@ process(void) SPACE tspace; size_t len, oldpsl; char *p; + int nextcycle; for (linenum = 0; mf_fgets(&PS, REPLACE);) { pd = 0; + nextcycle = 0; top: cp = prog; redirect: while (cp != NULL) { - if (!applies(cp)) { + if (!applies(cp) || nextcycle) { cp = cp->next; continue; } @@ -127,14 +129,16 @@ redirect: break; case 'd': pd = 1; - goto new; + nextcycle = 1; + break; case 'D': if (pd) goto new; if (psl == 0 || (p = memchr(ps, '\n', psl)) == NULL) { pd = 1; - goto new; + nextcycle = 1; + break; } else { psl -= (p + 1) - ps; memmove(ps, p + 1, psl); @@ -267,8 +271,9 @@ new:if (!nflag && !pd) * (lastline, linenumber, ps). */ #defineMATCH(a)\ - (a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, 0, psl) :\ - (a)->type == AT_LINE ? linenum == (a)->u.l : lastline() + (a)->type == AT_LINE ? linenum == (a)->u.l :\ + (a)->type == AT_LAST ? lastline() : \ + pd ? 0 : regexec_e((a)->u.r, ps, 0, 1, 0, psl) /* * Return TRUE if the command applies to the current line. Sets the inrange