Martin Gregorie wrote: > On Fri, 2010-03-12 at 16:27 +0200, Henrik K wrote: > > >> If you have enough words to require multiple REs, then sorting doesn't hurt. >> So the start boundaries for a single RE to catch on are minimized. >> >> > OK, so there are benefits if every alternate in a regex starts with the > same letter? > > Almost everything I know about the innards of regexes comes from > implementing them when I translated the code in Kernighan & Plauger's > "Software Tools in Pascal" into PL/9 (FYI PL/9 is a derivative of PL/M > for the 6809, so I did this a long time ago). I remember that was a > quite workable regex engine, but it had no optimisations and wasn't > startlingly fast. > > I now think I need to know more about how modern regex engines work and > in particular about the optimisations used by PCRE. Can anybody > recommend documentation on this topic?
Grab the O'Reilly book "Mastering Regular Expressions" by Jeffrey E.F. Friedl. -- Bowie
