Martin Gregorie wrote:
> On Fri, 2010-03-12 at 16:27 +0200, Henrik K wrote:
>
>   
>> If you have enough words to require multiple REs, then sorting doesn't hurt.
>> So the start boundaries for a single RE to catch on are minimized.
>>
>>     
> OK, so there are benefits if every alternate in a regex starts with the
> same letter?
>
> Almost everything I know about the innards of regexes comes from
> implementing them when I translated the code in Kernighan & Plauger's
> "Software Tools in Pascal" into PL/9 (FYI PL/9 is a derivative of PL/M
> for the 6809, so I did this a long time ago). I remember that was a
> quite workable regex engine, but it had no optimisations and wasn't
> startlingly fast.  
>
> I now think I need to know more about how modern regex engines work and
> in particular about the optimisations used by PCRE. Can anybody
> recommend documentation on this topic?

Grab the O'Reilly book "Mastering Regular Expressions" by Jeffrey E.F.
Friedl.

-- 
Bowie

Reply via email to