Re: sed(1): line addresses and next cycle

2021-08-16 Thread Martijn van Duren
On Mon, 2021-08-16 at 08:25 -0600, Todd C. Miller wrote:
> On Sun, 15 Aug 2021 23:56:54 +0200, Andreas Kusalananda 
> =?utf-8?B?S8OkaMOkcmk=?
> = wrote:
> 
> > But note that this comes out of a discussion on how to do '0,/re/'
> > addressing with OpenBSD sed.  Your changes appears to remove one way of
> > actually handling a match of '/re/' on the first line without giving us
> > another.  It would be better to have a clean way of doing the equivalent
> > of '0,/re/' than to remove a way to do this.  Interestingly (?), the sed
> > in plan9port works the same as our native sed.
> 
> As does AT&T sed on Solaris.  Since our sed currently matches the
> behavior of the "canonical" sed, I don't see a reason to change it.
> 
>  - todd
Ack. Like I said, just putting some feelers out. I'll drop the idea.



Re: sed(1): line addresses and next cycle

2021-08-16 Thread Todd C . Miller
On Sun, 15 Aug 2021 23:56:54 +0200, Andreas Kusalananda =?utf-8?B?S8OkaMOkcmk=?
= wrote:

> But note that this comes out of a discussion on how to do '0,/re/'
> addressing with OpenBSD sed.  Your changes appears to remove one way of
> actually handling a match of '/re/' on the first line without giving us
> another.  It would be better to have a clean way of doing the equivalent
> of '0,/re/' than to remove a way to do this.  Interestingly (?), the sed
> in plan9port works the same as our native sed.

As does AT&T sed on Solaris.  Since our sed currently matches the
behavior of the "canonical" sed, I don't see a reason to change it.

 - todd



Re: sed(1): line addresses and next cycle

2021-08-15 Thread Andreas Kusalananda Kähäri
On Sun, Aug 15, 2021 at 11:34:22PM +0200, Martijn van Duren wrote:
> Andreas Kähäri gave a nice example on misc@ on how our sed addressing
> implemenation differs from gsed[0][1]. While writing my reply I noticed
> that POSIX doesn't state how "next cycle" should be interpreted when it
> comes to address ranges. So I can't state that our implementation is
> wrong per se. However, I do think that gsed's interpretation is more
> intuitive, since a numeric address is not dependent on the context of
> the pattern space and thus should register as "in range".

But note that this comes out of a discussion on how to do '0,/re/'
addressing with OpenBSD sed.  Your changes appears to remove one way of
actually handling a match of '/re/' on the first line without giving us
another.  It would be better to have a clean way of doing the equivalent
of '0,/re/' than to remove a way to do this.  Interestingly (?), the sed
in plan9port works the same as our native sed.

Andreas

> 
> Diff below changes program parsing to more closely match gsed in this
> regard:
> $ printf 'test1\nbla1\ntest2\nbla2\n' | sed -e '1 { /^test/d; }' -e 
> '1,/^test/d'  
> bla1
> test2
> bla2
> $ printf 'test1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e 
> '1,/^test/d'  
> bla2
> $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' 
> -e '1,/^test/d'
> bla1
> test2
> bla2
> $ printf 'test1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
> '1,/^test/d' 
> bla2
> $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
> '1,/^test/d'
> bla1
> test2
> bla2
> 
> The diff passes regress, but hasn't had a lot of scrutiny. Just checking
> for general interest in changing this functionality. As soon as I
> know that it's something we might want I'll spend more braincycles on
> it.
> 
> martijn@
> 
> [0] https://marc.info/?l=openbsd-misc&m=162896537001890&w=2
> [1] https://marc.info/?l=openbsd-misc&m=162905748428954&w=2
> 
> Index: process.c
> ===
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.34
> diff -u -p -r1.34 process.c
> --- process.c 14 Nov 2018 10:59:33 -  1.34
> +++ process.c 15 Aug 2021 21:30:22 -
> @@ -89,14 +89,16 @@ process(void)
>   SPACE tspace;
>   size_t len, oldpsl;
>   char *p;
> + int nextcycle;
>  
>   for (linenum = 0; mf_fgets(&PS, REPLACE);) {
>   pd = 0;
> + nextcycle = 0;
>  top:
>   cp = prog;
>  redirect:
>   while (cp != NULL) {
> - if (!applies(cp)) {
> + if (!applies(cp) || nextcycle) {
>   cp = cp->next;
>   continue;
>   }
> @@ -127,14 +129,16 @@ redirect:
>   break;
>   case 'd':
>   pd = 1;
> - goto new;
> + nextcycle = 1;
> + break;
>   case 'D':
>   if (pd)
>   goto new;
>   if (psl == 0 ||
>   (p = memchr(ps, '\n', psl)) == NULL) {
>   pd = 1;
> - goto new;
> + nextcycle = 1;
> + break;
>   } else {
>   psl -= (p + 1) - ps;
>   memmove(ps, p + 1, psl);
> @@ -267,8 +271,9 @@ new:  if (!nflag && !pd)
>   * (lastline, linenumber, ps).
>   */
>  #define  MATCH(a)\
> - (a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, 0, psl) :\
> - (a)->type == AT_LINE ? linenum == (a)->u.l : lastline()
> + (a)->type == AT_LINE ? linenum == (a)->u.l :\
> + (a)->type == AT_LAST ? lastline() : \
> + pd ? 0 : regexec_e((a)->u.r, ps, 0, 1, 0, psl)
>  
>  /*
>   * Return TRUE if the command applies to the current line.  Sets the inrange
> 

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden

.



sed(1): line addresses and next cycle

2021-08-15 Thread Martijn van Duren
Andreas Kähäri gave a nice example on misc@ on how our sed addressing
implemenation differs from gsed[0][1]. While writing my reply I noticed
that POSIX doesn't state how "next cycle" should be interpreted when it
comes to address ranges. So I can't state that our implementation is
wrong per se. However, I do think that gsed's interpretation is more
intuitive, since a numeric address is not dependent on the context of
the pattern space and thus should register as "in range".

Diff below changes program parsing to more closely match gsed in this
regard:
$ printf 'test1\nbla1\ntest2\nbla2\n' | sed -e '1 { /^test/d; }' -e 
'1,/^test/d'  
bla1
test2
bla2
$ printf 'test1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e 
'1,/^test/d'  
bla2
$ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e 
'1,/^test/d'
bla1
test2
bla2
$ printf 'test1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
'1,/^test/d' 
bla2
$ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e 
'1,/^test/d'
bla1
test2
bla2

The diff passes regress, but hasn't had a lot of scrutiny. Just checking
for general interest in changing this functionality. As soon as I
know that it's something we might want I'll spend more braincycles on
it.

martijn@

[0] https://marc.info/?l=openbsd-misc&m=162896537001890&w=2
[1] https://marc.info/?l=openbsd-misc&m=162905748428954&w=2

Index: process.c
===
RCS file: /cvs/src/usr.bin/sed/process.c,v
retrieving revision 1.34
diff -u -p -r1.34 process.c
--- process.c   14 Nov 2018 10:59:33 -  1.34
+++ process.c   15 Aug 2021 21:30:22 -
@@ -89,14 +89,16 @@ process(void)
SPACE tspace;
size_t len, oldpsl;
char *p;
+   int nextcycle;
 
for (linenum = 0; mf_fgets(&PS, REPLACE);) {
pd = 0;
+   nextcycle = 0;
 top:
cp = prog;
 redirect:
while (cp != NULL) {
-   if (!applies(cp)) {
+   if (!applies(cp) || nextcycle) {
cp = cp->next;
continue;
}
@@ -127,14 +129,16 @@ redirect:
break;
case 'd':
pd = 1;
-   goto new;
+   nextcycle = 1;
+   break;
case 'D':
if (pd)
goto new;
if (psl == 0 ||
(p = memchr(ps, '\n', psl)) == NULL) {
pd = 1;
-   goto new;
+   nextcycle = 1;
+   break;
} else {
psl -= (p + 1) - ps;
memmove(ps, p + 1, psl);
@@ -267,8 +271,9 @@ new:if (!nflag && !pd)
  * (lastline, linenumber, ps).
  */
 #defineMATCH(a)\
-   (a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, 0, psl) :\
-   (a)->type == AT_LINE ? linenum == (a)->u.l : lastline()
+   (a)->type == AT_LINE ? linenum == (a)->u.l :\
+   (a)->type == AT_LAST ? lastline() : \
+   pd ? 0 : regexec_e((a)->u.r, ps, 0, 1, 0, psl)
 
 /*
  * Return TRUE if the command applies to the current line.  Sets the inrange