Re: faster fast grep
On Wed, May 01, 2013 at 04:28, J?r?mie Courr?ges-Anglas wrote: Ted Unangst t...@tedunangst.com writes: For simple patterns, grep has an optimization to avoid regex and run about 50% faster. The problem is its idea of simple patterns is too simple. IIUC the idea is to optimize for a lazy user that didn't want to type grep -F or fgrep: More or less. Note that it does support . wildcards unlike -F. I also have scripts that run grep where it would be annoying to get -F to grep when necessary. The following diff also tests for ']', ')' and '}' so that unbalanced use of those can be catched by regcomp if the latter happens to become stricter. Regress passes; does it seem OK? I think that's a good improvement.
faster fast grep
For simple patterns, grep has an optimization to avoid regex and run about 50% faster. The problem is its idea of simple patterns is too simple. This diff switches the logic around from a whitelist to a blacklist. We only need to abort the fast path if we see a magic regex character. Index: util.c === RCS file: /cvs/src/usr.bin/grep/util.c,v retrieving revision 1.45 diff -u -p -r1.45 util.c --- util.c 29 Dec 2012 01:32:44 - 1.45 +++ util.c 1 May 2013 00:00:30 - @@ -348,15 +348,8 @@ fastcomp(fastgrep_t *fg, const char *pat /* Look for ways to cheat...er...avoid the full regex engine. */ for (i = 0; i fg-patternLen; i++) { - /* Can still cheat? */ - if ((isalnum(fg-pattern[i])) || isspace(fg-pattern[i]) || - (fg-pattern[i] == '_') || (fg-pattern[i] == ',') || - (fg-pattern[i] == '=') || (fg-pattern[i] == '-') || - (fg-pattern[i] == ':') || (fg-pattern[i] == '/')) { - /* As long as it is good, upper case it for later. */ - if (iflag) - fg-pattern[i] = toupper(fg-pattern[i]); - } else if (fg-pattern[i] == '.') { + switch (fg-pattern[i]) { + case '.': hasDot = i; if (i fg-patternLen / 2) { if (firstHalfDot 0) @@ -368,11 +361,23 @@ fastcomp(fastgrep_t *fg, const char *pat if (firstLastHalfDot 0) firstLastHalfDot = i; } - } else { + break; + case '\\': + case '[': + case '(': + case '{': + case '?': + case '*': + case '+': + case '|': /* Free memory and let others know this is empty. */ free(fg-pattern); fg-pattern = NULL; return (-1); + default: + if (iflag) + fg-pattern[i] = toupper(fg-pattern[i]); + break; } }
Re: faster fast grep
Ted Unangst t...@tedunangst.com writes: For simple patterns, grep has an optimization to avoid regex and run about 50% faster. The problem is its idea of simple patterns is too simple. IIUC the idea is to optimize for a lazy user that didn't want to type grep -F or fgrep: $ grep foo /bar $ grep -R foo /baz/ Thus I think it would make sense to test whether ERE are on, since '|', '+', etc aren't special under BRE. The following diff also tests for ']', ')' and '}' so that unbalanced use of those can be catched by regcomp if the latter happens to become stricter. Regress passes; does it seem OK? -- Jérémie Courrèges-Anglas PGP Key fingerprint: 61DB D9A0 00A4 67CF 2A90 8961 6191 8FBF 06A1 1494 Index: util.c === RCS file: /cvs/src/usr.bin/grep/util.c,v retrieving revision 1.45 diff -u -p -r1.45 util.c --- util.c 29 Dec 2012 01:32:44 - 1.45 +++ util.c 1 May 2013 02:17:41 - @@ -348,15 +348,8 @@ fastcomp(fastgrep_t *fg, const char *pat /* Look for ways to cheat...er...avoid the full regex engine. */ for (i = 0; i fg-patternLen; i++) { - /* Can still cheat? */ - if ((isalnum(fg-pattern[i])) || isspace(fg-pattern[i]) || - (fg-pattern[i] == '_') || (fg-pattern[i] == ',') || - (fg-pattern[i] == '=') || (fg-pattern[i] == '-') || - (fg-pattern[i] == ':') || (fg-pattern[i] == '/')) { - /* As long as it is good, upper case it for later. */ - if (iflag) - fg-pattern[i] = toupper(fg-pattern[i]); - } else if (fg-pattern[i] == '.') { + switch (fg-pattern[i]) { + case '.': hasDot = i; if (i fg-patternLen / 2) { if (firstHalfDot 0) @@ -368,11 +361,28 @@ fastcomp(fastgrep_t *fg, const char *pat if (firstLastHalfDot 0) firstLastHalfDot = i; } - } else { + break; + case '(': case ')': + case '{': case '}': + /* Special in BRE if preceded by '\\' */ + case '?': + case '+': + case '|': + /* Not special in BRE. */ + if (!Eflag) + goto nonspecial; + case '\\': + case '*': + case '[': case ']': /* Free memory and let others know this is empty. */ free(fg-pattern); fg-pattern = NULL; return (-1); + default: +nonspecial: + if (iflag) + fg-pattern[i] = toupper(fg-pattern[i]); + break; } }