add support for \< and \> word delimiters in regcomp

Jonathan Gray Sun, 31 Aug 2014 21:05:21 -0700

This adds support for using the SVR4/glibc word delimeters
in regcomp as an extension to what posix requires.


We already have [[:<:]] and [[:>:]] as extensions, apparently
from 'Henry Spencer's Alpha 3.0 regex release' back in 1993.

But now Solaris/Linux/FreeBSD all have the other syntax
and sadly lots of uses of grep and sed in what are supposed
to be portable projects use it.

This diff is from Garrett D'Amore in Illumos via FreeBSD.
https://www.illumos.org/issues/516

Index: re_format.7
===================================================================
RCS file: /cvs/src/lib/libc/regex/re_format.7,v
retrieving revision 1.16
diff -u -p -r1.16 re_format.7
--- re_format.7 5 Jun 2013 22:05:29 -0000       1.16
+++ re_format.7 1 Sep 2014 03:51:27 -0000
@@ -304,6 +304,12 @@ This is an extension,
 compatible with but not specified by POSIX,
 and should be used with
 caution in software intended to be portable to other systems.
+The additional word delimiters  
+.Ql \e<
+and
+.Ql \e> 
+are provided to ease compatibility with traditional SVR4
+systems but are not portable and should be avoided.
 .Pp
 In the event that an RE could match more than one substring of a given
 string,
Index: regcomp.c
===================================================================
RCS file: /cvs/src/lib/libc/regex/regcomp.c,v
retrieving revision 1.24
diff -u -p -r1.24 regcomp.c
--- regcomp.c   6 May 2014 15:48:38 -0000       1.24
+++ regcomp.c   1 Sep 2014 03:25:44 -0000
@@ -349,7 +349,17 @@ p_ere_exp(struct parse *p)
        case '\\':
                REQUIRE(MORE(), REG_EESCAPE);
                c = GETNEXT();
-               ordinary(p, c);
+               switch (c) {
+               case '<':
+                       EMIT(OBOW, 0);
+                       break;
+               case '>':
+                       EMIT(OEOW, 0);
+                       break;
+               default:
+                       ordinary(p, c);
+                       break;
+               }
                break;
        case '{':               /* okay as ordinary except if digit follows */
                REQUIRE(!MORE() || !isdigit((uch)PEEK()), REG_BADRPT);
@@ -500,6 +510,12 @@ p_simp_re(struct parse *p,
                break;
        case '[':
                p_bracket(p);
+               break;
+       case BACKSL|'<':
+               EMIT(OBOW, 0);
+               break;
+       case BACKSL|'>':
+               EMIT(OEOW, 0);
                break;
        case BACKSL|'{':
                SETERROR(REG_BADRPT);

add support for \< and \> word delimiters in regcomp

Reply via email to