so here's two FAILs and one accidental PASS (because the test doesn't actually check the return code)...
grep: bad REGEX '': empty (sub)expression FAIL: grep -e blah -e '' echo -ne "one one one\n" > input echo -ne '' | grep -e blah -e '' input --- expected 2019-07-24 14:21:52.872813591 -0230 +++ actual 2019-07-24 14:21:52.872813591 -0230 @@ -1 +0,0 @@ -one one one grep: bad REGEX '': empty (sub)expression PASS: grep -w '' grep: bad REGEX '': empty (sub)expression FAIL: grep -w '' 2 echo -ne "one two\n" > input echo -ne '' | grep -w '' input --- expected 2019-07-24 14:21:52.982813591 -0230 +++ actual 2019-07-24 14:21:52.982813591 -0230 @@ -1 +0,0 @@ -one two POSIX says there's no such thing as an empty regular expression. (by having a grammar that excludes the possibility: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html) BSD agrees, and Android and macOS' regcomp() rejects the empty regular expression. GNU apparently disagrees (as i learned from your tests). not sure what to do here, in particular because -- given your tests -- i don't think we can represent the GNU interpretation as a POSIX regular expression? ...except i think there's a bug in the BSD implementation that does allow '()'. seems to have been there for at least 26 years judging by https://github.com/freebsd/freebsd/blame/master/lib/libc/regex/regcomp.c#L383 so i think it's probably safe to rely on that for the time being. glibc's happy with it too. patch attached. (i've said "BSD" rather than "POSIX" in the code comment because BSD makes it clearer that this is a practical rather than just theoretical concern.)
From 817af0ca56ca569551799d74dff14317bc66e1d5 Mon Sep 17 00:00:00 2001 From: Elliott Hughes <[email protected]> Date: Wed, 24 Jul 2019 16:12:34 -0700 Subject: [PATCH] grep: fake GNU behavior for non-POSIX empty regex. POSIX says there's no such thing as an empty regular expression. The grammar excludes the possibility: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html BSD agrees with POSIX, and Android and macOS' BSD-based implementations reject the empty regular expression. GNU apparently disagrees. Luckily, BSD does accept the empty *sub* expression `()`, despite their error message for REG_EMPTY being "empty (sub)expression". This is presumably a bug, except there's explicit code to support it that is at least 26 years old: https://github.com/freebsd/freebsd/blame/master/lib/libc/regex/regcomp.c#L383 This workaround also works fine with glibc. If we want GNU behavior, I'm struggling to come up with another way to fake it. If we want POSIX behavior, we could easily just add a check to reject "" on glibc. Also switch to xregcomp(). --- toys/posix/grep.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/toys/posix/grep.c b/toys/posix/grep.c index 6fd6bd2f..f2ab2a0d 100644 --- a/toys/posix/grep.c +++ b/toys/posix/grep.c @@ -398,20 +398,15 @@ static void parse_regex(void) TT.e = list; if (!FLAG(F)) { - int i; - // Convert regex list for (al = TT.e; al; al = al->next) { struct reg *shoe; if (FLAG(o) && !*al->arg) continue; dlist_add_nomalloc(&TT.reg, (void *)(shoe = xmalloc(sizeof(struct reg)))); - i = regcomp(&shoe->r, al->arg, - (REG_EXTENDED*!!FLAG(E)) | (REG_ICASE*!!FLAG(i))); - if (i) { - regerror(i, &shoe->r, toybuf, sizeof(toybuf)); - error_exit("bad REGEX '%s': %s", al->arg, toybuf); - } + // BSD regcomp doesn't support empty regex, so we fake that. + xregcomp(&shoe->r, *al->arg ? al->arg : "()", + (REG_EXTENDED*(!!FLAG(E)|!*al->arg))|(REG_ICASE*!!FLAG(i))); } dlist_terminate(TT.reg); } -- 2.22.0.657.g960e92d24f-goog
_______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
