so here's two FAILs and one accidental PASS (because the test doesn't
actually check the return code)...

grep: bad REGEX '': empty (sub)expression
FAIL: grep -e blah -e ''
echo -ne "one one one\n" > input
echo -ne '' | grep -e blah -e '' input
--- expected 2019-07-24 14:21:52.872813591 -0230
+++ actual 2019-07-24 14:21:52.872813591 -0230
@@ -1 +0,0 @@
-one one one
grep: bad REGEX '': empty (sub)expression
PASS: grep -w ''
grep: bad REGEX '': empty (sub)expression
FAIL: grep -w '' 2
echo -ne "one  two\n" > input
echo -ne '' | grep -w '' input
--- expected 2019-07-24 14:21:52.982813591 -0230
+++ actual 2019-07-24 14:21:52.982813591 -0230
@@ -1 +0,0 @@
-one  two

POSIX says there's no such thing as an empty regular expression. (by
having a grammar that excludes the possibility:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html)

BSD agrees, and Android and macOS' regcomp() rejects the empty regular
expression.

GNU apparently disagrees (as i learned from your tests).

not sure what to do here, in particular because -- given your tests --
i don't think we can represent the GNU interpretation as a POSIX
regular expression?

...except i think there's a bug in the BSD implementation that does
allow '()'. seems to have been there for at least 26 years judging by
https://github.com/freebsd/freebsd/blame/master/lib/libc/regex/regcomp.c#L383
so i think it's probably safe to rely on that for the time being.
glibc's happy with it too.

patch attached. (i've said "BSD" rather than "POSIX" in the code
comment because BSD makes it clearer that this is a practical rather
than just theoretical concern.)
From 817af0ca56ca569551799d74dff14317bc66e1d5 Mon Sep 17 00:00:00 2001
From: Elliott Hughes <[email protected]>
Date: Wed, 24 Jul 2019 16:12:34 -0700
Subject: [PATCH] grep: fake GNU behavior for non-POSIX empty regex.

POSIX says there's no such thing as an empty regular expression. The
grammar excludes the possibility:

  https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

BSD agrees with POSIX, and Android and macOS' BSD-based implementations
reject the empty regular expression.

GNU apparently disagrees.

Luckily, BSD does accept the empty *sub* expression `()`, despite their
error message for REG_EMPTY being "empty (sub)expression". This is
presumably a bug, except there's explicit code to support it that is at
least 26 years old:

  https://github.com/freebsd/freebsd/blame/master/lib/libc/regex/regcomp.c#L383

This workaround also works fine with glibc.

If we want GNU behavior, I'm struggling to come up with another way to
fake it. If we want POSIX behavior, we could easily just add a check to
reject "" on glibc.

Also switch to xregcomp().
---
 toys/posix/grep.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/toys/posix/grep.c b/toys/posix/grep.c
index 6fd6bd2f..f2ab2a0d 100644
--- a/toys/posix/grep.c
+++ b/toys/posix/grep.c
@@ -398,20 +398,15 @@ static void parse_regex(void)
   TT.e = list;
 
   if (!FLAG(F)) {
-    int i;
-
     // Convert regex list
     for (al = TT.e; al; al = al->next) {
       struct reg *shoe;
 
       if (FLAG(o) && !*al->arg) continue;
       dlist_add_nomalloc(&TT.reg, (void *)(shoe = xmalloc(sizeof(struct reg))));
-      i = regcomp(&shoe->r, al->arg,
-                  (REG_EXTENDED*!!FLAG(E)) | (REG_ICASE*!!FLAG(i)));
-      if (i) {
-        regerror(i, &shoe->r, toybuf, sizeof(toybuf));
-        error_exit("bad REGEX '%s': %s", al->arg, toybuf);
-      }
+      // BSD regcomp doesn't support empty regex, so we fake that.
+      xregcomp(&shoe->r, *al->arg ? al->arg : "()",
+               (REG_EXTENDED*(!!FLAG(E)|!*al->arg))|(REG_ICASE*!!FLAG(i)));
     }
     dlist_terminate(TT.reg);
   }
-- 
2.22.0.657.g960e92d24f-goog

_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to