Noel Jones wrote:

> The pattern length limit is controlled by the pcre library 
> you're using.  I think most implementations limit single 
> expressions to 64k characters.

Obviously something that needs testing.

> It's unclear to me if a single huge complex expression will 
> evaluate faster that multiple less complex expressions.

I'm not exactly sure how the pcre regex engine works in Postfix.
My assumptions below is that each pattern is matched individually
which is why I am suggesting that patterns can be combined for
speed improvements.

If the multiple complex expressions have the same prefix, then
combining the prefix test into a single expression will definitely
be faster to fail some non matching strings than using multiple
less complex expressions.

Consider the input string '123-234-32-12.whatever' and now compare
matching against three rules:

     /^([0-9]{1,3}\.){4}foo$/
     /^([0-9]{1,3}\.){4}bar$/
     /^([0-9]{1,3}\.){4}baz$/

In this ase, there will be three attempts (one on each pattern)
that fail on the fourth character ('-') of the input pattern. That
means that to fail all three patterns, there will be 12 character
comparisions.

Now compare that against:

     /^([0-9]{1,3}\.){4}(foo|bar|baz)$/

which will again fail on the fourth character, but there is only one
pattern which matches the same strings as the 3 patterns above.

> (your sample expression looks a little wonky to me.  You sure 
> it works?)

No, this was a poorly checked paper example.

> Improving performance would be better accomplished by 
> enclosing the similar lines in an IF..ENDIF statement. 
> Performance should be improved for non-matching input, 
> readability and maintainability is dramatically improved.

Personally I find reading regexes a pita even though I've been 
doing it for about 2 decades.

My idea was to autogenerate the complex regexes using
something like this:

    178.183.237.0.dsl.dynamic.eranet.pl
    183.246.69.111.dynamic.snap.net.nz
    188.146.109.136.nat.umts.dynamic.eranet.pl

as input.

> Skipping rules always beats evaluating rules.

Agreed.

> Unreadable rules should be avoided.

Unless those rules were never intended to me read or modified
by hand.

Erik
-- 
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Reply via email to