Andreas Kähäri gave a nice example on misc@ on how our sed addressing
implemenation differs from gsed[0][1]. While writing my reply I noticed
that POSIX doesn't state how "next cycle" should be interpreted when it
comes to address ranges. So I can't state that our implementation is
wrong per se. However, I do think that gsed's interpretation is more
intuitive, since a numeric address is not dependent on the context of
the pattern space and thus should register as "in range".

Diff below changes program parsing to more closely match gsed in this
regard:
$ printf 'test1\nbla1\ntest2\nbla2\n' | sed -e '1 { /^test/d; }' -e 
'1,/^test/d'      
bla1
test2
bla2
$ printf 'test1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e 
'1,/^test/d'      
bla2
$ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e 
'1,/^test/d'
bla1
test2
bla2
$ printf 'test1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
'1,/^test/d'     
bla2
$ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
'1,/^test/d'
bla1
test2
bla2

The diff passes regress, but hasn't had a lot of scrutiny. Just checking
for general interest in changing this functionality. As soon as I
know that it's something we might want I'll spend more braincycles on
it.

martijn@

[0] https://marc.info/?l=openbsd-misc&m=162896537001890&w=2
[1] https://marc.info/?l=openbsd-misc&m=162905748428954&w=2

Index: process.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/process.c,v
retrieving revision 1.34
diff -u -p -r1.34 process.c
--- process.c   14 Nov 2018 10:59:33 -0000      1.34
+++ process.c   15 Aug 2021 21:30:22 -0000
@@ -89,14 +89,16 @@ process(void)
        SPACE tspace;
        size_t len, oldpsl;
        char *p;
+       int nextcycle;
 
        for (linenum = 0; mf_fgets(&PS, REPLACE);) {
                pd = 0;
+               nextcycle = 0;
 top:
                cp = prog;
 redirect:
                while (cp != NULL) {
-                       if (!applies(cp)) {
+                       if (!applies(cp) || nextcycle) {
                                cp = cp->next;
                                continue;
                        }
@@ -127,14 +129,16 @@ redirect:
                                break;
                        case 'd':
                                pd = 1;
-                               goto new;
+                               nextcycle = 1;
+                               break;
                        case 'D':
                                if (pd)
                                        goto new;
                                if (psl == 0 ||
                                    (p = memchr(ps, '\n', psl)) == NULL) {
                                        pd = 1;
-                                       goto new;
+                                       nextcycle = 1;
+                                       break;
                                } else {
                                        psl -= (p + 1) - ps;
                                        memmove(ps, p + 1, psl);
@@ -267,8 +271,9 @@ new:                if (!nflag && !pd)
  * (lastline, linenumber, ps).
  */
 #define        MATCH(a)                                                \
-       (a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, 0, psl) :    \
-           (a)->type == AT_LINE ? linenum == (a)->u.l : lastline()
+           (a)->type == AT_LINE ? linenum == (a)->u.l :        \
+           (a)->type == AT_LAST ? lastline() :                 \
+           pd ? 0 : regexec_e((a)->u.r, ps, 0, 1, 0, psl)
 
 /*
  * Return TRUE if the command applies to the current line.  Sets the inrange


Reply via email to