Re: svn commit: r355590 - in head/usr.bin/sed: . tests tests/regress.multitest.out

2019-12-10 Thread Kyle Evans
On Tue, Dec 10, 2019 at 1:16 PM Kyle Evans  wrote:
>
> Author: kevans
> Date: Tue Dec 10 19:16:00 2019
> New Revision: 355590
> URL: https://svnweb.freebsd.org/changeset/base/355590
>
> Log:
>   sed: process \r, \n, and \t
>
>   This is both reasonable and a common GNUism that a lot of ported software
>   expects.
>
>   Universally process \r, \n, and \t into carriage return, newline, and tab
>   respectively. Newline still doesn't function in contexts where it can't
>   (e.g. BRE), but we process it anyways rather than passing
>   UB \n (escaped ordinary) through to the underlying regex engine.
>

This part of the message is wrong -- it would pass just an ordinary
'n', rather than an escaped ordinary, and lead to potential
false-positives if you think you're matching on an embedded newline
but instead match on 'n'. Further, my reading of POSIX's statement on
this leads me to believe that we have to treat it as a newline rather
than embedding it as 'n' or escaped-'n' which regex(3) will certainly
not interpret as a newline.

>   Adding a --posix flag to disable these was considered, but sed.1 already
>   declares this version of sed a super-set of POSIX specification and this
>   behavior is the most likely expected when one attempts to use one of these
>   escape sequences in pattern space.
>
>   This differs from pre-r197362 behavior in that we now honor the three
>   arguably most common escape sequences used with sed(1) and we do so outside
>   of character classes, too.
>
>   Other escape sequences, like \s and \S, will come later when GNU extensions
>   are added to libregex; sed will likely link against libregex by default,
>   since the GNU extensions tend to be fairly un-intrusive.
>
>   PR:   229925
>   Reviewed by:  bapt, emaste, pfg
>   Differential Revision:https://reviews.freebsd.org/D22750
>
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r355590 - in head/usr.bin/sed: . tests tests/regress.multitest.out

2019-12-10 Thread Kyle Evans
Author: kevans
Date: Tue Dec 10 19:16:00 2019
New Revision: 355590
URL: https://svnweb.freebsd.org/changeset/base/355590

Log:
  sed: process \r, \n, and \t
  
  This is both reasonable and a common GNUism that a lot of ported software
  expects.
  
  Universally process \r, \n, and \t into carriage return, newline, and tab
  respectively. Newline still doesn't function in contexts where it can't
  (e.g. BRE), but we process it anyways rather than passing
  UB \n (escaped ordinary) through to the underlying regex engine.
  
  Adding a --posix flag to disable these was considered, but sed.1 already
  declares this version of sed a super-set of POSIX specification and this
  behavior is the most likely expected when one attempts to use one of these
  escape sequences in pattern space.
  
  This differs from pre-r197362 behavior in that we now honor the three
  arguably most common escape sequences used with sed(1) and we do so outside
  of character classes, too.
  
  Other escape sequences, like \s and \S, will come later when GNU extensions
  are added to libregex; sed will likely link against libregex by default,
  since the GNU extensions tend to be fairly un-intrusive.
  
  PR:   229925
  Reviewed by:  bapt, emaste, pfg
  Differential Revision:https://reviews.freebsd.org/D22750

Modified:
  head/usr.bin/sed/compile.c
  head/usr.bin/sed/tests/regress.multitest.out/8.22
  head/usr.bin/sed/tests/sed2_test.sh

Modified: head/usr.bin/sed/compile.c
==
--- head/usr.bin/sed/compile.c  Tue Dec 10 18:57:39 2019(r355589)
+++ head/usr.bin/sed/compile.c  Tue Dec 10 19:16:00 2019(r355590)
@@ -395,10 +395,21 @@ compile_delimited(char *p, char *d, int is_tr)
continue;
} else if (*p == '\\' && p[1] == '[') {
*d++ = *p++;
-   } else if (*p == '\\' && p[1] == c)
+   } else if (*p == '\\' && p[1] == c) {
p++;
-   else if (*p == '\\' && p[1] == 'n') {
-   *d++ = '\n';
+   } else if (*p == '\\' &&
+   (p[1] == 'n' || p[1] == 'r' || p[1] == 't')) {
+   switch (p[1]) {
+   case 'n':
+   *d++ = '\n';
+   break;
+   case 'r':
+   *d++ = '\r';
+   break;
+   case 't':
+   *d++ = '\t';
+   break;
+   }
p += 2;
continue;
} else if (*p == '\\' && p[1] == '\\') {
@@ -428,13 +439,29 @@ compile_ccl(char **sp, char *t)
*t++ = *s++;
if (*s == ']')
*t++ = *s++;
-   for (; *s && (*t = *s) != ']'; s++, t++)
+   for (; *s && (*t = *s) != ']'; s++, t++) {
if (*s == '[' && ((d = *(s+1)) == '.' || d == ':' || d == '=')) 
{
*++t = *++s, t++, s++;
for (c = *s; (*t = *s) != ']' || c != d; s++, t++)
if ((c = *s) == '\0')
return NULL;
+   } else if (*s == '\\') {
+   switch (s[1]) {
+   case 'n':
+   *t = '\n';
+   s++;
+   break;
+   case 'r':
+   *t = '\r';
+   s++;
+   break;
+   case 't':
+   *t = '\t';
+   s++;
+   break;
+   }
}
+   }
return (*s == ']') ? *sp = ++s, ++t : NULL;
 }
 
@@ -521,8 +548,23 @@ compile_subst(char *p, struct s_subst *s)
linenum, fname, 
*p);
if (s->maxbref < ref)
s->maxbref = ref;
-   } else if (*p == '&' || *p == '\\')
-   *sp++ = '\\';
+   } else {
+   switch (*p) {
+   case '&':
+   case '\\':
+   *sp++ = '\\';
+   break;
+   case 'n':
+   *p = '\n';
+   break;
+   case 'r':
+   *p = '\r';
+