On Sun, Aug 15, 2021 at 11:34:22PM +0200, Martijn van Duren wrote:
> Andreas Kähäri gave a nice example on misc@ on how our sed addressing
> implemenation differs from gsed[0][1]. While writing my reply I noticed
> that POSIX doesn't state how "next cycle" should be interpreted when it
> comes to address ranges. So I can't state that our implementation is
> wrong per se. However, I do think that gsed's interpretation is more
> intuitive, since a numeric address is not dependent on the context of
> the pattern space and thus should register as "in range".

But note that this comes out of a discussion on how to do '0,/re/'
addressing with OpenBSD sed.  Your changes appears to remove one way of
actually handling a match of '/re/' on the first line without giving us
another.  It would be better to have a clean way of doing the equivalent
of '0,/re/' than to remove a way to do this.  Interestingly (?), the sed
in plan9port works the same as our native sed.

Andreas

> 
> Diff below changes program parsing to more closely match gsed in this
> regard:
> $ printf 'test1\nbla1\ntest2\nbla2\n' | sed -e '1 { /^test/d; }' -e 
> '1,/^test/d'      
> bla1
> test2
> bla2
> $ printf 'test1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e 
> '1,/^test/d'      
> bla2
> $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' 
> -e '1,/^test/d'
> bla1
> test2
> bla2
> $ printf 'test1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
> '1,/^test/d'     
> bla2
> $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
> '1,/^test/d'
> bla1
> test2
> bla2
> 
> The diff passes regress, but hasn't had a lot of scrutiny. Just checking
> for general interest in changing this functionality. As soon as I
> know that it's something we might want I'll spend more braincycles on
> it.
> 
> martijn@
> 
> [0] https://marc.info/?l=openbsd-misc&m=162896537001890&w=2
> [1] https://marc.info/?l=openbsd-misc&m=162905748428954&w=2
> 
> Index: process.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.34
> diff -u -p -r1.34 process.c
> --- process.c 14 Nov 2018 10:59:33 -0000      1.34
> +++ process.c 15 Aug 2021 21:30:22 -0000
> @@ -89,14 +89,16 @@ process(void)
>       SPACE tspace;
>       size_t len, oldpsl;
>       char *p;
> +     int nextcycle;
>  
>       for (linenum = 0; mf_fgets(&PS, REPLACE);) {
>               pd = 0;
> +             nextcycle = 0;
>  top:
>               cp = prog;
>  redirect:
>               while (cp != NULL) {
> -                     if (!applies(cp)) {
> +                     if (!applies(cp) || nextcycle) {
>                               cp = cp->next;
>                               continue;
>                       }
> @@ -127,14 +129,16 @@ redirect:
>                               break;
>                       case 'd':
>                               pd = 1;
> -                             goto new;
> +                             nextcycle = 1;
> +                             break;
>                       case 'D':
>                               if (pd)
>                                       goto new;
>                               if (psl == 0 ||
>                                   (p = memchr(ps, '\n', psl)) == NULL) {
>                                       pd = 1;
> -                                     goto new;
> +                                     nextcycle = 1;
> +                                     break;
>                               } else {
>                                       psl -= (p + 1) - ps;
>                                       memmove(ps, p + 1, psl);
> @@ -267,8 +271,9 @@ new:              if (!nflag && !pd)
>   * (lastline, linenumber, ps).
>   */
>  #define      MATCH(a)                                                \
> -     (a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, 0, psl) :    \
> -         (a)->type == AT_LINE ? linenum == (a)->u.l : lastline()
> +         (a)->type == AT_LINE ? linenum == (a)->u.l :        \
> +         (a)->type == AT_LAST ? lastline() :                 \
> +         pd ? 0 : regexec_e((a)->u.r, ps, 0, 1, 0, psl)
>  
>  /*
>   * Return TRUE if the command applies to the current line.  Sets the inrange
> 

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden

.

Reply via email to