Re: regular expressions was: Kernel Oops

Stan Hoeppner Tue, 08 Mar 2011 08:11:41 -0800

mouss put forth on 3/7/2011 5:45 PM:
> Le 07/03/2011 15:13, Stan Hoeppner a écrit :


>> Ok, so if I'm doing what I've heard called a "fully qualified regular
>> expression", WRT FQrDNS matching, should I use the anchors or not?
>> postmap -q says these all work (the actuals with action and text that is).

>> /^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/
> .dynamic.chello.sk    REJECT blah blah
> 
>> /^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/
> .dyn.forthnet.gr      REJECT blah blah
> 
>> /^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/
> /dyn\.4u.com\.gh$/    REJECT blah
> assuming you get real mail from there. otherwise
> .4u.com.gh    REJECT blah

Yes, these can all be done with a hash/cdb.  But these are being added
to my fqrdns.pcre file.  As the name implies the goal is to exactly
match fully qualified reverse DNS strings, at least, that's part of the
goal.  The other part is the exact opposite:  _not_ matching them.  I'll
explain that a little later.

>> /^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/
> ahem? I fail to see what yoy're trying to match here. \d is a \w, so
> [\d\w] is the same as \w. do you mean \W (capital letter)? anyway:

I tried \d alone in those places and postmap -q wouldn't match it.  I
scoured my regex cheat sheet and it said \d is for digits, and \w is for
alphas.  I added \d\w and it worked.  I was trying to match this oddball
FQrDNS:

541ABE2E.cm-5-3c.dynamic.ziggo.nl

> well, that's what regular expressions are about by default:
> /foo/ means contains foo
> /^foo/ means starts with foo
> /foo$/ means ends with foo

Got it.  You (or Noel) already explained this, and it really helps
understanding.

> so
> /^bart.*homer.*marge$/ means: starts with "bart", ends with "marge" and
> somewhere between these contains "homer".

Also good to understand.


Ok, to explain the "not matching" goal.  The PCRE file is almost 1700
expressions, and growing.  In a couple of years it could be double that
size.  Over a longer period of time it could hit 5000 expressions.  For
users of this file, it is usually the first table checked against a
connecting smtp client.  That client rDNS will match 1 of 1700
expressions, or none.  Thus, we want the fastest processing of the "does
not match case, as this is the common case.

A match is "rare" from a mathematical and cycles consumed standpoint.
Modern processors are extremely fast.  But if our expressions aren't
speed optimized for the "does not match" case, we're slowing our system
down.  For most systems this is irrelevant.  But for an extremely high
volume MX gateway system, receiving say, 3000 connects/second,
consisting of 2700 spam bots and showshoe servers, with 300 legit mails
to be relayed to downstream mailbox servers, a few extra milliseconds of
table processing time per connection adds up quickly.  Assuming this
host is running the full gamut of anti spam checks, policy daemons,
content filters, etc, we need to keep each as lean as possible.

If this example MX gateway sees spikes of 5000 connections/second due to
a large botnet targeting multiple users, any extra delay this PCRE table
imposes may contribute to bogging the system down, and cause unwanted
delays.

So, the question is, which form of expression processes the "does not
match" case faster?  The fully qualified expression, or the simple
expression?  Noel mentioned that the fully qualified expressions will
tend to process faster.  Is this true?  Is it true for both the
"matches" and "does not match" case?

Thanks again for continuing my regex education guys. :)  This knew
knowledge and understanding is already paying dividends, mostly in time
savings and I'm knocking expressions out more easily without having to
reference help docs. :)

-- 
Stan

Re: regular expressions was: Kernel Oops

Reply via email to