Re: add support for \ and \ word delimiters in regcomp

2014-09-07 Thread Jonathan Gray
On Mon, Sep 01, 2014 at 12:41:37AM -0400, Ted Unangst wrote:
 On Mon, Sep 01, 2014 at 14:03, Jonathan Gray wrote:
  This adds support for using the SVR4/glibc word delimeters
  in regcomp as an extension to what posix requires.
  
  We already have [[::]] and [[::]] as extensions, apparently
  from 'Henry Spencer's Alpha 3.0 regex release' back in 1993.
  
  But now Solaris/Linux/FreeBSD all have the other syntax
  and sadly lots of uses of grep and sed in what are supposed
  to be portable projects use it.
  
  This diff is from Garrett D'Amore in Illumos via FreeBSD.
  https://www.illumos.org/issues/516
 
 I have a slight preference for my diff (I think it's clearer than
 deeper nested switches), but no matter.
 
 http://marc.info/?l=openbsd-techm=131094975127745w=2

I'd be fine with that one going in as well.

Are there any reasons not to add it?  I don't see a portable alternative
here as brought up by Mark in that thread, and the only if it's
supported on the majority of UNIX-ike operating system comment
seems to be true?



Re: add support for \ and \ word delimiters in regcomp

2014-09-07 Thread Todd C. Miller
On Mon, 08 Sep 2014 02:28:42 +1000, Jonathan Gray wrote:

 I'd be fine with that one going in as well.
 
 Are there any reasons not to add it?  I don't see a portable alternative
 here as brought up by Mark in that thread, and the only if it's
 supported on the majority of UNIX-ike operating system comment
 seems to be true?

My preference is for Ted's patch.  I was against this initially but
now that it is supported by most modern systems I don't have a
problem with it.

 - todd



add support for \ and \ word delimiters in regcomp

2014-08-31 Thread Jonathan Gray
This adds support for using the SVR4/glibc word delimeters
in regcomp as an extension to what posix requires.

We already have [[::]] and [[::]] as extensions, apparently
from 'Henry Spencer's Alpha 3.0 regex release' back in 1993.

But now Solaris/Linux/FreeBSD all have the other syntax
and sadly lots of uses of grep and sed in what are supposed
to be portable projects use it.

This diff is from Garrett D'Amore in Illumos via FreeBSD.
https://www.illumos.org/issues/516

Index: re_format.7
===
RCS file: /cvs/src/lib/libc/regex/re_format.7,v
retrieving revision 1.16
diff -u -p -r1.16 re_format.7
--- re_format.7 5 Jun 2013 22:05:29 -   1.16
+++ re_format.7 1 Sep 2014 03:51:27 -
@@ -304,6 +304,12 @@ This is an extension,
 compatible with but not specified by POSIX,
 and should be used with
 caution in software intended to be portable to other systems.
+The additional word delimiters  
+.Ql \e
+and
+.Ql \e 
+are provided to ease compatibility with traditional SVR4
+systems but are not portable and should be avoided.
 .Pp
 In the event that an RE could match more than one substring of a given
 string,
Index: regcomp.c
===
RCS file: /cvs/src/lib/libc/regex/regcomp.c,v
retrieving revision 1.24
diff -u -p -r1.24 regcomp.c
--- regcomp.c   6 May 2014 15:48:38 -   1.24
+++ regcomp.c   1 Sep 2014 03:25:44 -
@@ -349,7 +349,17 @@ p_ere_exp(struct parse *p)
case '\\':
REQUIRE(MORE(), REG_EESCAPE);
c = GETNEXT();
-   ordinary(p, c);
+   switch (c) {
+   case '':
+   EMIT(OBOW, 0);
+   break;
+   case '':
+   EMIT(OEOW, 0);
+   break;
+   default:
+   ordinary(p, c);
+   break;
+   }
break;
case '{':   /* okay as ordinary except if digit follows */
REQUIRE(!MORE() || !isdigit((uch)PEEK()), REG_BADRPT);
@@ -500,6 +510,12 @@ p_simp_re(struct parse *p,
break;
case '[':
p_bracket(p);
+   break;
+   case BACKSL|'':
+   EMIT(OBOW, 0);
+   break;
+   case BACKSL|'':
+   EMIT(OEOW, 0);
break;
case BACKSL|'{':
SETERROR(REG_BADRPT);



Re: add support for \ and \ word delimiters in regcomp

2014-08-31 Thread Ted Unangst
On Mon, Sep 01, 2014 at 14:03, Jonathan Gray wrote:
 This adds support for using the SVR4/glibc word delimeters
 in regcomp as an extension to what posix requires.
 
 We already have [[::]] and [[::]] as extensions, apparently
 from 'Henry Spencer's Alpha 3.0 regex release' back in 1993.
 
 But now Solaris/Linux/FreeBSD all have the other syntax
 and sadly lots of uses of grep and sed in what are supposed
 to be portable projects use it.
 
 This diff is from Garrett D'Amore in Illumos via FreeBSD.
 https://www.illumos.org/issues/516

I have a slight preference for my diff (I think it's clearer than
deeper nested switches), but no matter.

http://marc.info/?l=openbsd-techm=131094975127745w=2