Re: faster fast grep

2013-05-01 Thread Ted Unangst
On Wed, May 01, 2013 at 04:28, J?r?mie Courr?ges-Anglas wrote:
 Ted Unangst t...@tedunangst.com writes:
 
 For simple patterns, grep has an optimization to avoid regex and run
 about 50% faster. The problem is its idea of simple patterns is too
 simple.
 
 IIUC the idea is to optimize for a lazy user that didn't want to
 type grep -F or fgrep:

More or less. Note that it does support . wildcards unlike -F. I also
have scripts that run grep where it would be annoying to get -F to
grep when necessary.

 The following diff also tests for ']', ')' and '}' so that unbalanced
 use of those can be catched by regcomp if the latter happens to become
 stricter.
 
 Regress passes; does it seem OK?

I think that's a good improvement.



Re: faster fast grep

2013-04-30 Thread Jérémie Courrèges-Anglas
Ted Unangst t...@tedunangst.com writes:

 For simple patterns, grep has an optimization to avoid regex and run
 about 50% faster. The problem is its idea of simple patterns is too
 simple.

IIUC the idea is to optimize for a lazy user that didn't want to
type grep -F or fgrep:

$ grep foo /bar
$ grep -R foo /baz/

Thus I think it would make sense to test whether ERE are on, since '|',
'+', etc aren't special under BRE.

The following diff also tests for ']', ')' and '}' so that unbalanced
use of those can be catched by regcomp if the latter happens to become
stricter.

Regress passes; does it seem OK?
-- 
Jérémie Courrèges-Anglas
PGP Key fingerprint: 61DB D9A0 00A4 67CF 2A90  8961 6191 8FBF 06A1 1494

Index: util.c
===
RCS file: /cvs/src/usr.bin/grep/util.c,v
retrieving revision 1.45
diff -u -p -r1.45 util.c
--- util.c  29 Dec 2012 01:32:44 -  1.45
+++ util.c  1 May 2013 02:17:41 -
@@ -348,15 +348,8 @@ fastcomp(fastgrep_t *fg, const char *pat
/* Look for ways to cheat...er...avoid the full regex engine. */
for (i = 0; i  fg-patternLen; i++)
{
-   /* Can still cheat? */
-   if ((isalnum(fg-pattern[i])) || isspace(fg-pattern[i]) ||
-   (fg-pattern[i] == '_') || (fg-pattern[i] == ',') ||
-   (fg-pattern[i] == '=') || (fg-pattern[i] == '-') ||
-   (fg-pattern[i] == ':') || (fg-pattern[i] == '/')) {
-   /* As long as it is good, upper case it for later. */
-   if (iflag)
-   fg-pattern[i] = toupper(fg-pattern[i]);
-   } else if (fg-pattern[i] == '.') {
+   switch (fg-pattern[i]) {
+   case '.':
hasDot = i;
if (i  fg-patternLen / 2) {
if (firstHalfDot  0)
@@ -368,11 +361,28 @@ fastcomp(fastgrep_t *fg, const char *pat
if (firstLastHalfDot  0)
firstLastHalfDot = i;
}
-   } else {
+   break;
+   case '(': case ')':
+   case '{': case '}':
+   /* Special in BRE if preceded by '\\' */
+   case '?':
+   case '+':
+   case '|':
+   /* Not special in BRE. */
+   if (!Eflag)
+   goto nonspecial;
+   case '\\':
+   case '*':
+   case '[': case ']':
/* Free memory and let others know this is empty. */
free(fg-pattern);
fg-pattern = NULL;
return (-1);
+   default:
+nonspecial:
+   if (iflag)
+   fg-pattern[i] = toupper(fg-pattern[i]);
+   break;
}
}