Re: regular expressions was: Kernel Oops

2011-03-10 Thread Steven Champeon
on Wed, Mar 09, 2011 at 11:00:34AM +1100, Erik de Castro Lopo wrote:
 My idea was to autogenerate the complex regexes using
 something like this:
 
 178.183.237.0.dsl.dynamic.eranet.pl
 183.246.69.111.dynamic.snap.net.nz
 188.146.109.136.nat.umts.dynamic.eranet.pl
 
 as input.

FWIW, of 74510 patterns in the most recent Enemieslist patterns release,
9779 of them match leading four digits separated by dots (2447) or
dashes (589) or a mix of dots or dashes (the rest). You will have your
hands full coming up with groups of same.

-- 
hesketh.com/inc. v: +1(919)834-2552 f: +1(919)834-2553 w: http://hesketh.com/
antispam news and intelligence to help you stop spam: http://enemieslist.com/


Re: regular expressions was: Kernel Oops

2011-03-09 Thread Stan Hoeppner
mouss put forth on 3/8/2011 5:03 PM:
 [WARNING: Steven CC'd]
 

 things. so I'd say, do not consider performances as a primary target. go
 for catching spammers first. only tune after you get the irght rules,
 and only if needed (I personally don't tune anything here. I'm happy to
 focus on catching spammers).

Likewise.  In my particular case execution time of the table is
irrelevant.  However, the execution latency of very large tables on busy
systems piqued my curiosity, giving me a desire to learn more, so I can
avoid adopting potentially bad habits now that may come back to haunt
me, performance wise, in the future.

Also, it's very possible, maybe more likely than not, that I
misunderstood some of Steven's advice, or took it out of context.

(Steven, sorry for inadvertently dragging you into the mosh pit) :)

Some who have been working with regular expressions for a long time may
feel otherwise, but at this point I find them fascinating.  From a spam
fighting standpoint they can be extremely powerful.  Again, I just want
to make sure I develop good habits now.

WRT Viktor's earlier post, I have seen examples of the grouping with
if/then blocks.  In fact, the fqrdns.pcre file makes use of them.
Although I'm not sure it's well optimized in this case.  There seem to
be an enormous number of expressions within a single if/then block, and
IIRC, there are only three such groupings in the set of 1600+
expressions.  So there's probably room for more performance
optimization.  At the table's current size though, I'm guessing the
potential performance gain wouldn't be worth the tweaking labor.

-- 
Stan


Re: regular expressions was: Kernel Oops

2011-03-09 Thread Stan Hoeppner
Steve put forth on 3/8/2011 5:12 PM:

 Maybe using if/endif conditions like Stan Hoeppner has done on his pcre map 
 could speedup things even more? - http://www.hardwarefreak.com/fqrdns.pcre

You're giving me too much credit. ;)  Again, I'm not the original author
of that table.  That person created the if/then structure.  I was
ignorant of exactly how it works in a PCRE until the last 24 hours.

I've simply made some additions, and fixed some minor errors I found, as
have others.  My current role WRT to the table is simply making it
freely available for others, adding an expression now and then,
incorporating contributions from others so all changes hit a master
copy, and spreading the word a little now and then as I think it's a
pretty useful A/S tool.

-- 
Stan



Re: regular expressions was: Kernel Oops

2011-03-08 Thread Stan Hoeppner
mouss put forth on 3/7/2011 5:45 PM:
 Le 07/03/2011 15:13, Stan Hoeppner a écrit :

 Ok, so if I'm doing what I've heard called a fully qualified regular
 expression, WRT FQrDNS matching, should I use the anchors or not?
 postmap -q says these all work (the actuals with action and text that is).

 /^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/
 .dynamic.chello.skREJECT blah blah
 
 /^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/
 .dyn.forthnet.gr  REJECT blah blah
 
 /^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/
 /dyn\.4u.com\.gh$/REJECT blah
 assuming you get real mail from there. otherwise
 .4u.com.ghREJECT blah

Yes, these can all be done with a hash/cdb.  But these are being added
to my fqrdns.pcre file.  As the name implies the goal is to exactly
match fully qualified reverse DNS strings, at least, that's part of the
goal.  The other part is the exact opposite:  _not_ matching them.  I'll
explain that a little later.

 /^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/
 ahem? I fail to see what yoy're trying to match here. \d is a \w, so
 [\d\w] is the same as \w. do you mean \W (capital letter)? anyway:

I tried \d alone in those places and postmap -q wouldn't match it.  I
scoured my regex cheat sheet and it said \d is for digits, and \w is for
alphas.  I added \d\w and it worked.  I was trying to match this oddball
FQrDNS:

541ABE2E.cm-5-3c.dynamic.ziggo.nl

 well, that's what regular expressions are about by default:
 /foo/ means contains foo
 /^foo/ means starts with foo
 /foo$/ means ends with foo

Got it.  You (or Noel) already explained this, and it really helps
understanding.

 so
 /^bart.*homer.*marge$/ means: starts with bart, ends with marge and
 somewhere between these contains homer.

Also good to understand.


Ok, to explain the not matching goal.  The PCRE file is almost 1700
expressions, and growing.  In a couple of years it could be double that
size.  Over a longer period of time it could hit 5000 expressions.  For
users of this file, it is usually the first table checked against a
connecting smtp client.  That client rDNS will match 1 of 1700
expressions, or none.  Thus, we want the fastest processing of the does
not match case, as this is the common case.

A match is rare from a mathematical and cycles consumed standpoint.
Modern processors are extremely fast.  But if our expressions aren't
speed optimized for the does not match case, we're slowing our system
down.  For most systems this is irrelevant.  But for an extremely high
volume MX gateway system, receiving say, 3000 connects/second,
consisting of 2700 spam bots and showshoe servers, with 300 legit mails
to be relayed to downstream mailbox servers, a few extra milliseconds of
table processing time per connection adds up quickly.  Assuming this
host is running the full gamut of anti spam checks, policy daemons,
content filters, etc, we need to keep each as lean as possible.

If this example MX gateway sees spikes of 5000 connections/second due to
a large botnet targeting multiple users, any extra delay this PCRE table
imposes may contribute to bogging the system down, and cause unwanted
delays.

So, the question is, which form of expression processes the does not
match case faster?  The fully qualified expression, or the simple
expression?  Noel mentioned that the fully qualified expressions will
tend to process faster.  Is this true?  Is it true for both the
matches and does not match case?

Thanks again for continuing my regex education guys. :)  This knew
knowledge and understanding is already paying dividends, mostly in time
savings and I'm knocking expressions out more easily without having to
reference help docs. :)

-- 
Stan


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Wietse Venema
Stan Hoeppner:
 So, the question is, which form of expression processes the does not
 match case faster?  The fully qualified expression, or the simple
 expression?  Noel mentioned that the fully qualified expressions will
 tend to process faster.  Is this true?  Is it true for both the
 matches and does not match case?

I would expect better performance when patterns only match the text
that needs to be matched.

If you must match a very large numbers of patterns, you need an
implementation that transforms N patterns into one deterministic
automaton. This can match 1 pattern in the same time as N patterns.
Once the automaton is built (which takes some time) it is blindingly
fast. An example of such an implementation is flex.

Similar optimizations are needed for large CIDR maps. Right now,
Postfix's linear search does 10^8 patterns/s. With this, postscreen
can search the largest ipdeny.com file in 1ms on a modern CPU,
which is sufficient for the moment. To make it fast, the CIDR
entries need to be arranged into a tree that can be traversed in
log(N) time.

Wietse


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Stan Hoeppner
Wietse Venema put forth on 3/8/2011 10:39 AM:
 Stan Hoeppner:
 So, the question is, which form of expression processes the does not
 match case faster?  The fully qualified expression, or the simple
 expression?  Noel mentioned that the fully qualified expressions will
 tend to process faster.  Is this true?  Is it true for both the
 matches and does not match case?
 
 I would expect better performance when patterns only match the text
 that needs to be matched.

So this would mean the simpler expressions would be faster?  That makes
me wonder why Enemies List[1] uses complex expressions, each one
precisely matching a specific rDNS pattern, given EL matches 65k+
patterns total.  Likewise, the original author of my fqrdns.pcre table
also used mostly expressions that exactly match a specific rDNS pattern,
although in this case we have only 1600+ expressions so speed isn't as
critical.

I've not made 1 to 1 equivalent simpler expressions and run timing
tests.  It would be rather time consuming to copy the current table and
simplify the expressions in the copy.  I'm wondering now if execution
times would show any meaningful difference.  I wonder if testing just a
small subset, say 100 expressions, would be sufficient to show
meaningful execution time differences.

 If you must match a very large numbers of patterns, you need an
 implementation that transforms N patterns into one deterministic
 automaton. This can match 1 pattern in the same time as N patterns.
 Once the automaton is built (which takes some time) it is blindingly
 fast. An example of such an implementation is flex.

This sounds really interesting.  Do you have a link to info about this
flex software?  I'd like to read about it.

 Similar optimizations are needed for large CIDR maps. Right now,
 Postfix's linear search does 10^8 patterns/s. With this, postscreen
 can search the largest ipdeny.com file in 1ms on a modern CPU,
 which is sufficient for the moment. To make it fast, the CIDR
 entries need to be arranged into a tree that can be traversed in
 log(N) time.

I recall you and Viktor discussing this a while ago.  I don't really
understand how an OP (myself) would go about creating a tree of our CIDR
tables.  Or is this something that the Postfix CIDR code would handle?


[1] Enemies List is not available for Postfix, yet, and the intelligence
dataset is not free, although the source code is open.  EL is integrated
in some commercial AS appliances and commercial mail software.  I
mention it frequently here because it is the only antispam tool I'm
aware of that makes almost exclusive use of regexes to identify likely
spam sources, and it uses 10s of thousands of regexes.

-- 
Stan


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Victor Duchovni
On Tue, Mar 08, 2011 at 02:29:23PM -0600, Stan Hoeppner wrote:

 So this would mean the simpler expressions would be faster?  That makes
 me wonder why Enemies List[1] uses complex expressions, each one
 precisely matching a specific rDNS pattern,

To avoid false positives by matching in the wrong context.
The performance can be improved by grouping:

/^\d+\.\d+\.\d+\.\d+$/  DUNNO only hostnames matched below

if /\.net$/
# patterns for .net hosts
...
/^/ DUNNO done with .net[
endif

if /\.net\.au$/
# patterns for .net.au hosts
...
/^/ DUNNO done with .net.au
endif

if /\.com$/
# patterns for .com hosts
...
/^/ DUNNO done with .com
endif

if /\.edu$/
# patterns for .edu hosts
...
/^/ DUNNO done with .edu
endif

-- 
Viktor.


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Erik de Castro Lopo
Wietse Venema wrote:

 If you must match a very large numbers of patterns, you need an
 implementation that transforms N patterns into one deterministic
 automaton. This can match 1 pattern in the same time as N patterns.
 Once the automaton is built (which takes some time) it is blindingly
 fast. An example of such an implementation is flex.

Is there a limit the the pattern length in the pcre tables?

If not, it would be possible to convert this (3 only, but could be
hundreds or even thousands):

   /^([0-9]{1,3}\.){4}\.dsl\.dynamic\.eranet\.pl$/
   /^([0-9]{1,3}\.){4}\.dynamic\.snap\.net\.nz$/
   /^([0-9]{1,3}\.){4}\.nat\.umts\.dynamic\.eranet\.pl$/

to this:

   
/^([0-9]{1,3}\.){4}\.(dsl\.dynamic\.eranet\.pl|dynamic\.snap\.net\.nz|nat\.umts\.dynamic\.eranet\.pl)$/

and that should reject 1.1.1.1.not-found in 1/3 the time of the
three original regexes while also matching quicker than the original.

Obviously, a conversion from the first three to the optimised version
has to be done mechanistically to avoid errors.

Cheers,
Erik
-- 
--
Erik de Castro Lopo
http://www.mega-nerd.com/


Re: regular expressions was: Kernel Oops

2011-03-08 Thread mouss
[WARNING: Steven CC'd]

Le 08/03/2011 21:29, Stan Hoeppner a écrit :
 Wietse Venema put forth on 3/8/2011 10:39 AM:
 Stan Hoeppner:
 So, the question is, which form of expression processes the does not
 match case faster?  The fully qualified expression, or the simple
 expression?  Noel mentioned that the fully qualified expressions will
 tend to process faster.  Is this true?  Is it true for both the
 matches and does not match case?

 I would expect better performance when patterns only match the text
 that needs to be matched.
 

to get better performance, one would use patterns that fail to match as
soon as possible. I mean if you have /^a/, then the check would stop
as soon as the first char isn't an a. but the expressions we would
like to match and the expressions we see are completely different
things. so I'd say, do not consider performances as a primary target. go
for catching spammers first. only tune after you get the irght rules,
and only if needed (I personally don't tune anything here. I'm happy to
focus on catching spammers).

 So this would mean the simpler expressions would be faster?

No. /^a(complex blah)/ is faster than /joe/ because the first will stop
if the first char sin't a whatever is the rest of the expression.

  That makes
 me wonder why Enemies List[1] uses complex expressions, each one
 precisely matching a specific rDNS pattern, given EL matches 65k+
 patterns total. 

as said above, the goal isn't performance (to improve performance, buy
better hardware or run multiple instances). The goal of Steven is to
maximize hit rate while minimizing false positives. many of us have
created rules to block generic/dynamic/silly senders. when doing so, you
can start by being precise at the risk of doing a lot of work because
your rules minimise FPs, or going the other side by using expressions
that block a lot of senders inclusing legitimate ones, that is
increasing the FP rate. it takes time and efforts to get a good balance,
and that's what Steven work is about.

[snip]
 
 If you must match a very large numbers of patterns, you need an
 implementation that transforms N patterns into one deterministic
 automaton. This can match 1 pattern in the same time as N patterns.
 Once the automaton is built (which takes some time) it is blindingly
 fast. An example of such an implementation is flex.
 
 This sounds really interesting.  Do you have a link to info about this
 flex software?  I'd like to read about it.
 

[note: it wasn't me who said the text above. I however studied the
problem, in a completely different context. I can tell you one thing:
forget about optimizing your pcre rules. optimisation is useful in DNA
matching problems and the like. and even then...).

 Similar optimizations are needed for large CIDR maps. Right now,
 Postfix's linear search does 10^8 patterns/s. With this, postscreen
 can search the largest ipdeny.com file in 1ms on a modern CPU,
 which is sufficient for the moment. To make it fast, the CIDR
 entries need to be arranged into a tree that can be traversed in
 log(N) time.
 
 I recall you and Viktor discussing this a while ago.  I don't really
 understand how an OP (myself) would go about creating a tree of our CIDR
 tables.  Or is this something that the Postfix CIDR code would handle?
 

if cidr is to be enhanced, then it would be done inside cidr
implementation. the problem is the usual one: algorithms are often said
to be k*O(f(n)). so you generally prefer f(n)=log(n) over f(n)=n^2. but
this is only good for large n, and n is never large, so you need to
remember about the k constant. said otherwise: k1 * n^2  k2  log(n)
for small n under some conditions.

 
 [1] Enemies List is not available for Postfix, yet, and the intelligence
 dataset is not free, although the source code is open.  EL is integrated
 in some commercial AS appliances and commercial mail software.  I
 mention it frequently here because it is the only antispam tool I'm
 aware of that makes almost exclusive use of regexes to identify likely
 spam sources, and it uses 10s of thousands of regexes.
 

I don't use EL, but I think it is usable with postfix. Steven, can you
confirm this? (some of the features may be sendmail oriented, but it
would be easy to generalize them).



Re: regular expressions was: Kernel Oops

2011-03-08 Thread Steve

 Original-Nachricht 
 Datum: Wed, 9 Mar 2011 09:49:21 +1100
 Von: Erik de Castro Lopo mle+to...@mega-nerd.com
 An: postfix-users@postfix.org
 Betreff: Re: regular expressions  was:  Kernel Oops

 Wietse Venema wrote:
 
  If you must match a very large numbers of patterns, you need an
  implementation that transforms N patterns into one deterministic
  automaton. This can match 1 pattern in the same time as N patterns.
  Once the automaton is built (which takes some time) it is blindingly
  fast. An example of such an implementation is flex.
 
 Is there a limit the the pattern length in the pcre tables?
 
I think there is one (if memory does not fool me then it is somewhere around 
1000 characters). But I am not 100% sure.


 If not, it would be possible to convert this (3 only, but could be
 hundreds or even thousands):
 
/^([0-9]{1,3}\.){4}\.dsl\.dynamic\.eranet\.pl$/
/^([0-9]{1,3}\.){4}\.dynamic\.snap\.net\.nz$/
/^([0-9]{1,3}\.){4}\.nat\.umts\.dynamic\.eranet\.pl$/
 
Are you sure the above is correct? You have there a double dot and I think that 
is not correct.


 to this:
 
   
 /^([0-9]{1,3}\.){4}\.(dsl\.dynamic\.eranet\.pl|dynamic\.snap\.net\.nz|nat\.umts\.dynamic\.eranet\.pl)$/
 
Or even shorter:
/^([0-9]{1,3}\.){4}((dsl\.dynamic|nat\.umts)\.dynamic\.eranet\.pl|dynamic\.snap\.net\.nz)$/

Maybe using if/endif conditions like Stan Hoeppner has done on his pcre map 
could speedup things even more? - http://www.hardwarefreak.com/fqrdns.pcre


 and that should reject 1.1.1.1.not-found in 1/3 the time of the
 three original regexes while also matching quicker than the original.
 
 Obviously, a conversion from the first three to the optimised version
 has to be done mechanistically to avoid errors.
 
Well... if the source is already buggy (double dot issue) then automating that 
transformation is not going to help you much.



 Cheers,
 Erik
 -- 
// Steve


 --
 Erik de Castro Lopo
 http://www.mega-nerd.com/

-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Wietse Venema
mouss:
[ Charset ISO-8859-1 unsupported, converting... ]
 Le 08/03/2011 23:49, Erik de Castro Lopo a ?crit :
  Wietse Venema wrote:
  
  If you must match a very large numbers of patterns, you need an
  implementation that transforms N patterns into one deterministic
  automaton. This can match 1 pattern in the same time as N patterns.
  Once the automaton is built (which takes some time) it is blindingly
  fast. An example of such an implementation is flex.
  
  Is there a limit the the pattern length in the pcre tables?
  
  If not, it would be possible to convert this (3 only, but could be
  hundreds or even thousands):
  
 /^([0-9]{1,3}\.){4}\.dsl\.dynamic\.eranet\.pl$/
 /^([0-9]{1,3}\.){4}\.dynamic\.snap\.net\.nz$/
 /^([0-9]{1,3}\.){4}\.nat\.umts\.dynamic\.eranet\.pl$/
  
  to this:
  
 
  /^([0-9]{1,3}\.){4}\.(dsl\.dynamic\.eranet\.pl|dynamic\.snap\.net\.nz|nat\.umts\.dynamic\.eranet\.pl)$/
  
  and that should reject 1.1.1.1.not-found in 1/3 the time of the
  three original regexes while also matching quicker than the original.
 
 
 your speculations are wrong. /(joe|foo|bar)/ isn't /3 times faster than
 individual tests. but before all, premature optimisation is the root of
 all evil. one should not convert readable stuff to unmaintainable
 hieroglyph without measuring the real benefits.

In the Postfix implementation, each regexp/pcre pattern is executed
separately, therefore (a|b|c) is faster than separate rules for a,
b and c. The savings are noticeable only in body_checks.

As for large numbers of CIDR patterns, I was referring to files
with 100,000 patterns. That is a non-trivial number, and I took
care to implement this such that postscreen could handle them.

I do agree with all the comments about skipping patterns with
IF/ENDIF or terminating matches early (which PCRE is very good at
if you use look-ahead and look-behind).

Wietse


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Erik de Castro Lopo
Noel Jones wrote:

 The pattern length limit is controlled by the pcre library 
 you're using.  I think most implementations limit single 
 expressions to 64k characters.

Obviously something that needs testing.

 It's unclear to me if a single huge complex expression will 
 evaluate faster that multiple less complex expressions.

I'm not exactly sure how the pcre regex engine works in Postfix.
My assumptions below is that each pattern is matched individually
which is why I am suggesting that patterns can be combined for
speed improvements.

If the multiple complex expressions have the same prefix, then
combining the prefix test into a single expression will definitely
be faster to fail some non matching strings than using multiple
less complex expressions.

Consider the input string '123-234-32-12.whatever' and now compare
matching against three rules:

 /^([0-9]{1,3}\.){4}foo$/
 /^([0-9]{1,3}\.){4}bar$/
 /^([0-9]{1,3}\.){4}baz$/

In this ase, there will be three attempts (one on each pattern)
that fail on the fourth character ('-') of the input pattern. That
means that to fail all three patterns, there will be 12 character
comparisions.

Now compare that against:

 /^([0-9]{1,3}\.){4}(foo|bar|baz)$/

which will again fail on the fourth character, but there is only one
pattern which matches the same strings as the 3 patterns above.

 (your sample expression looks a little wonky to me.  You sure 
 it works?)

No, this was a poorly checked paper example.

 Improving performance would be better accomplished by 
 enclosing the similar lines in an IF..ENDIF statement. 
 Performance should be improved for non-matching input, 
 readability and maintainability is dramatically improved.

Personally I find reading regexes a pita even though I've been 
doing it for about 2 decades.

My idea was to autogenerate the complex regexes using
something like this:

178.183.237.0.dsl.dynamic.eranet.pl
183.246.69.111.dynamic.snap.net.nz
188.146.109.136.nat.umts.dynamic.eranet.pl

as input.

 Skipping rules always beats evaluating rules.

Agreed.

 Unreadable rules should be avoided.

Unless those rules were never intended to me read or modified
by hand.

Erik
-- 
--
Erik de Castro Lopo
http://www.mega-nerd.com/


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Steven Champeon
on Wed, Mar 09, 2011 at 12:03:27AM +0100, mouss wrote:
 [WARNING: Steven CC'd]

:-)
 
 Le 08/03/2011 21:29, Stan Hoeppner a écrit :
  That makes me wonder why Enemies List[1] uses complex expressions,
  each one precisely matching a specific rDNS pattern, given EL
  matches 65k+ patterns total.

Eh, it varies quite a bit, some of them are complex groups like this:

[0-9]+\-[0-9]+\-[0-9]+\-[0-9]+\.dynamic\.(brasov|craiova|fagaras|resita|sfantugheorghe|victoria|zarnesti)\.rdsnet\.ro

because for whatever reason I can't just use a [0-9a-z\-]+ in place of
the group, or because they just grew over time as I saw more hosts. But
some are relatively simple:

[0-9a-z\-]+\-[0-9]+\.fiberlink\.[a-z]+\.rdsnet\.ro

wherever I can get away with it. You have to be careful with blanket
alphanumeric token host parts, because sometimes you're matching a
city or town or state or abbreviation and everything's fine, and then
the ISP starts putting 'mail' or 'static' in that token's position in
a similar hostname and suddenly you're blocking more than residential
dynamic cable modems. :-/

eg [0-9]+\-[0-9]+\-[0-9]+\-[0-9]+\.mail[0-9]+\.fft\.com\.au
   [0-9]+\-[0-9]+\-[0-9]+\-[0-9]+\.mail\.eletti\.com\.br
   [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\.mail\.sistemairis\.com\.br

I haven't really tried to optimize the regular expressions, because of
the way our library processes them - by walking down a tree from '.'
(so, '.' - ro - rdsnet - all the patterns for rdsnet.ro) - so perf is
acceptable (several hundred thousand matches/sec on decent hardware;
~225K lookups/s on my old Macbook via C program).

Oh, and we're long past 65K - last build was 74494 patterns. I keep
forgetting to update the Web site. :-)

 as said above, the goal isn't performance (to improve performance, buy
 better hardware or run multiple instances).

Well, no, the goal is acceptable performance, but also managable update
mechanisms that allow for rapid correction of FP classifications. 

 The goal of Steven is to maximize hit rate while minimizing false
 positives. many of us have created rules to block
 generic/dynamic/silly senders. when doing so, you can start by being
 precise at the risk of doing a lot of work because your rules minimise
 FPs, or going the other side by using expressions that block a lot of
 senders inclusing legitimate ones, that is increasing the FP rate. it
 takes time and efforts to get a good balance, and that's what Steven
 work is about.

Yup. And it took me a few months to really understand that the useful
concept of a 'generic' hostname also unfortunately also applied to large
mail farms that we wanted mail from. (Now we track 'outmx' patterns,
too, and they account for around an eighth of all the patterns we have.
Same goes for 'webhost' - we mostly just see phishing scams from most of
them, but when you're analyzing someone's mailflow it helps to be able
to tell them which of their mail is coming from legit or quasi-legit
mail sources.)

I used to have a few hundred compact expressions, like this, which were
left-anchored but not fully qualified:

%compact = (
   duN = 'du[0-9]+',
  dynN = 'dyn[0-9]+',
  pppN = 'ppp[0-9]+',
 N-N-N = '[0-9]+\-[0-9]+\-[0-9]+',
 dhcpH = 'dhcp[0-9a-f]+',
 dhcpN = 'dhcp[0-9]+',
 dialN = 'dial[0-9]+',
 duN-N = 'du[0-9]+\-[0-9]+',
 dyn-N = 'dyn\-[0-9]+',
 portN = 'port[0-9]+',
 ppp-N = 'ppp\-[0-9]+',
dhcp-N = 'dhcp\-[0-9]+',
dial-N = 'dial\-[0-9]+',
dialup = 'dialup',
du-N-N = 'du\-[0-9]+\-[0-9]+',
dynN-N = 'dyn[0-9]+\-[0-9]+',
port-N = 'port\-[0-9]+',

[...]

but frankly the FP rate was so awful I ditched them. And not just
because of silly people like whoever set up Marriott's reservations
transactional servers with names like host184.marriott.com, but they
were one very big reason why I ditched them.
 
 [snip]
  
  If you must match a very large numbers of patterns, you need an
  implementation that transforms N patterns into one deterministic
  automaton. This can match 1 pattern in the same time as N patterns.
  Once the automaton is built (which takes some time) it is blindingly
  fast. An example of such an implementation is flex.
  
  This sounds really interesting.  Do you have a link to info about this
  flex software?  I'd like to read about it.

Oh, that was what we tried first. Matt Sergeant wrote a perl wrapper
around a hunk of C object code that we generated using re2c. Worked
fine, you feed it regexes, it generates C code, you compile it into
an object and call it from a simple perl DNS server, voila. That was
how I provided the first instance of the Enemieslist via DNSBL, for
a year or so, on a Mac Mini. As far as the code went, it worked great.

Unfortunately, it took almost an hour to compile, and that was back
when I only had a few thousand patterns. Oh, and you had to recompile
every 

Re: regular expressions was: Kernel Oops

2011-03-08 Thread Noel Jones

On 3/8/2011 6:00 PM, Erik de Castro Lopo wrote:

Noel Jones wrote:


The pattern length limit is controlled by the pcre library
you're using.  I think most implementations limit single
expressions to 64k characters.


Obviously something that needs testing.


Many years ago I worked on a system with a 32k limit on pcre 
expressions.  Ever since then, everything I've checked has 
been 64k, and then I gave up checking.  I expect any 
non-ancient system will support 64k, and some maybe even more. 
 (To clarify for others following along, this is a characters 
per single expression limit, not a filesize or number of 
expressions per file limit)



Consider the input string '123-234-32-12.whatever' and now compare
matching against three rules:

  /^([0-9]{1,3}\.){4}foo$/
  /^([0-9]{1,3}\.){4}bar$/
  /^([0-9]{1,3}\.){4}baz$/

In this ase, there will be three attempts (one on each pattern)
that fail on the fourth character ('-') of the input pattern. That
means that to fail all three patterns, there will be 12 character
comparisions.

Now compare that against:

  /^([0-9]{1,3}\.){4}(foo|bar|baz)$/

which will again fail on the fourth character, but there is only one
pattern which matches the same strings as the 3 patterns above.


This example is pretty easy to see that combining is better. 
It's not so clear if you create 32k of complex gibberish if it 
will actually operate faster as there may be significant 
startup times.  YMMV and all that.


BTW, with pcre you should use the the non-greedy flag inside 
parenthesis if you're not doing $n substitutions.  This saves 
another smidgen of time and memory.

/^(?:[0-9]{1,3}\.){4}(?:foo|bar|baz)$/


 -- Noel Jones


Re: regular expressions was: Kernel Oops

2011-03-08 Thread Erik de Castro Lopo
Noel Jones wrote:

 Many years ago I worked on a system with a 32k limit on pcre 
 expressions.  Ever since then, everything I've checked has 
 been 64k, and then I gave up checking.  I expect any 
 non-ancient system will support 64k, and some maybe even more. 
   (To clarify for others following along, this is a characters 
 per single expression limit, not a filesize or number of 
 expressions per file limit)

Thanks for the info.

  Now compare that against:
 
/^([0-9]{1,3}\.){4}(foo|bar|baz)$/
 
  which will again fail on the fourth character, but there is only one
  pattern which matches the same strings as the 3 patterns above.
 
 This example is pretty easy to see that combining is better. 

Exactly. Fortunately this is the very common example that will
very easily lend itself to this optimisation.

 It's not so clear if you create 32k of complex gibberish if it 
 will actually operate faster as there may be significant 
 startup times.  YMMV and all that.

I agree completely.

 BTW, with pcre you should use the the non-greedy flag inside 
 parenthesis if you're not doing $n substitutions.  This saves 
 another smidgen of time and memory.
 /^(?:[0-9]{1,3}\.){4}(?:foo|bar|baz)$/

Good tip, thanks.

Erik
-- 
--
Erik de Castro Lopo
http://www.mega-nerd.com/


Re: Kernel Oops

2011-03-07 Thread Stan Hoeppner
mouss put forth on 3/6/2011 7:03 PM:

 /^.*foo/
 means it starts with something followed by foo. and this is the same
 thing as it contains foo, which is represented by
 /foo/

I was taught to always start my expressions with /^ and end them with
$/.  Why did Steven teach me to do this if it's not necessary?  Steven
being the author of the Enemies List:  http://enemieslist.com/ which
contains over 65,000 regexes matching FQrDNS patterns.

 well, you know I know these:) we all got spam from these...

As with most/all dynamic ranges.

 1) first use IP ranges.
 2) then domains (hash/cdb)
 for example:
 .alshamil.net.ae  REJECT blah blah
 because there is no point to try to match something like  
   auh-b113917.alshamil.net.ae
 
 3) then use regular expressions, but only when IPs and domains aren't
 the way to go.

Well, you know I know these mouss. :)  Have ever been locked in a
certain train of thought and simply forgot to consider something
related, later putting hand to forehead and saying Duh!.  My mindset
was focused on showing how a single PCRE can block the same number of
hosts as using IP addresses in a CIDR or hash table.  I just didn't
consider the domain blocking aspect of hash tables at the time.  That's
the Duh!.  I've been blocking domains with my hash table for something
like 6 years now...  I think some folks call this a brain fart.  ;)

 no. IPs and domains are different things.

 cidr is about IPs. hash/cdb/pcre is about names. these are different
 things and you know that. use each as appropriate.

Of course.  But IPs are valid in a hash table.  You can even list them
by the equivalent of a /24, /16, and /8 if you like, simply by omitting
the last 1, 2, or 3 octets of the dotted quad.  Just as I brain farted
WRT using domains in a hash table, it appears you have done the same WRT
to using IP addresses in a hash table. :)

I agree it makes more sense to block domains with hash/cdb and IPs with
CIDR.  I've been doing exactly that for 5 of the 6 years I've been
running Postfix.  The first year (maybe less) I blocked IPs with a hash
table, until I joined this list and learned about CIDR tables.  I'm
guessing most other new Postfix OPs go through the same
progression--most beginners docs returned via Google teach the hash
table and nothing else.

 if the ISP makes it too much, then you should reduce it:
 .embarqhsd.netREJECT blah blah

Yeah, but then you end up potentially blocking large numbers of ham
servers in SOHO land, in this case *.sta.embarqhsd.net.  Even in 2011
there are still hundreds of thousands or more SOHO MTAs on static IP
aDSL and cable circuits with generic rDNS.  I should know as I'm one of
them.  (Please let's not allow this to turn into yet another flame war
WRT generic rDNS, real OPs rent a VPS/colo, yada yada--I'm not directing
this at you mouss but to those predisposed to flog this dead, stripped
to the bone, horse carcass).

 a better example would be
 /(\W\d+){4}\..*\.embarqhsd\.net$/   REJECT ...

 Better in what way? 
 
 in the sense that this can't be represented using hash or the like.

Ok.  So you're not showing this PCRE above because it better matches the
target rDNS string, or that the engine executes it faster or something,
etc.  You're simply saying don't use a PCRE for something you can match
using a simpler table, such as hash/cdb.  Correct?

-- 
Stan




Re: Kernel Oops

2011-03-07 Thread Ansgar Wiechers
On 2011-03-07 Stan Hoeppner wrote:
 mouss put forth on 3/6/2011 7:03 PM:
 /^.*foo/
 means it starts with something followed by foo. and this is the same
 thing as it contains foo, which is represented by
 /foo/
 
 I was taught to always start my expressions with /^ and end them
 with $/.  Why did Steven teach me to do this if it's not necessary?

I wouldn't know what his rationale was, but Noel and mouss are certainly
right. Anchoring something between wildcard matches is utterly
pointless.

As mouss explained above, /^.*foo/, /.*foo/ and /foo/ produce the same
results. That is, unless your regexp processor implicitly anchors an
expression at the beginning of the string, in which case you'd need the
leading .*, but still won't need to explicitly anchor it with a ^.

Regards
Ansgar Wiechers
-- 
Abstractions save us time working, but they don't save us time learning.
--Joel Spolsky


Re: Kernel Oops

2011-03-07 Thread Noel Jones

On 3/7/2011 4:47 AM, Stan Hoeppner wrote:


I was taught to always start my expressions with /^ and end them with
$/.  Why did Steven teach me to do this if it's not necessary?


That's good advice when you're actually matching something.

The special case of .* means, as you know, anything or 
nothing.  There's never a case where it's necessary to 
explicitly match a leading or trailing anything or nothing.


Consider:
/^.*foo$/
  match the string beginning with anything or nothing, ending 
with foo.


can always be simplified to:
/foo$/
  match the string ending with foo.

This works the same without the ending $ anchor (contains foo, 
rather than ends with foo), but helps the illustration.


(In the other special case where you're using $1, $2, etc. 
substitution in the result, you might need some form of 
/^(.*foo)$/ to fill the substitution buffer, but that's about 
substitution, not about matching.)




  -- Noel Jones


Re: Kernel Oops

2011-03-07 Thread Stan Hoeppner
Noel Jones put forth on 3/7/2011 7:00 AM:
 On 3/7/2011 4:47 AM, Stan Hoeppner wrote:

 I was taught to always start my expressions with /^ and end them with
 $/.  Why did Steven teach me to do this if it's not necessary?
 
 That's good advice when you're actually matching something.

Ok, so if I'm doing what I've heard called a fully qualified regular
expression, WRT FQrDNS matching, should I use the anchors or not?
postmap -q says these all work (the actuals with action and text that is).

/^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/
/^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/
/^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/
/^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/
/^(\d{1,3}\.){4}dynamic\.snap\.net\.nz$/
/^pppoe-dyn(-\d{1,3}){4}\.kosnet\.ru$/

 The special case of .* means, as you know, anything or nothing. 
 There's never a case where it's necessary to explicitly match a leading
 or trailing anything or nothing.

What of the case where you want to match something in the middle of the
input string, with extra junk on both ends?

 Consider:
 /^.*foo$/
   match the string beginning with anything or nothing, ending with foo.
 
 can always be simplified to:
 /foo$/
   match the string ending with foo.
 
 This works the same without the ending $ anchor (contains foo, rather
 than ends with foo), but helps the illustration.

So, in my examples above, given we're matching rDNS patterns, are the
anchors necessary, or helpful?  If not using them means contains, then
they should still match.  What advantage is there to using the anchors
when matching rDNS patterns?  Any?

 (In the other special case where you're using $1, $2, etc. substitution
 in the result, you might need some form of /^(.*foo)$/ to fill the
 substitution buffer, but that's about substitution, not about matching.)

Thank you for the continuing PCRE education Noel, and Ansgar. :)

-- 
Stan


Re: Kernel Oops

2011-03-07 Thread Noel Jones

On 3/7/2011 8:13 AM, Stan Hoeppner wrote:

Noel Jones put forth on 3/7/2011 7:00 AM:

On 3/7/2011 4:47 AM, Stan Hoeppner wrote:


I was taught to always start my expressions with /^ and end them with
$/.  Why did Steven teach me to do this if it's not necessary?


That's good advice when you're actually matching something.


Ok, so if I'm doing what I've heard called a fully qualified regular
expression, WRT FQrDNS matching, should I use the anchors or not?
postmap -q says these all work (the actuals with action and text that is).

/^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/
/^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/
/^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/
/^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/
/^(\d{1,3}\.){4}dynamic\.snap\.net\.nz$/
/^pppoe-dyn(-\d{1,3}){4}\.kosnet\.ru$/


In these examples, you're explicitly matching something at the 
start and/or end of the string.  Using the anchors is correct 
and recommended.






The special case of .* means, as you know, anything or nothing.
There's never a case where it's necessary to explicitly match a leading
or trailing anything or nothing.


What of the case where you want to match something in the middle of the
input string, with extra junk on both ends?


If you're looking for a string that contains foo anywhere, simply
/foo/
with no anchors.





Consider:
/^.*foo$/
   match the string beginning with anything or nothing, ending with foo.

can always be simplified to:
/foo$/
   match the string ending with foo.

This works the same without the ending $ anchor (contains foo, rather
than ends with foo), but helps the illustration.


So, in my examples above, given we're matching rDNS patterns, are the
anchors necessary, or helpful?  If not using them means contains, then
they should still match.  What advantage is there to using the anchors
when matching rDNS patterns?  Any?


You use anchors to reduce the chance of a false positive.  A 
side benefit is improved performance.


Any pattern that matches with the anchors will still match 
without the anchors, but may match additional input that you 
don't intend to match.  In the case of the rDNS patterns, a FP 
is unlikely (but possible, more so with the shorter patterns).


In other cases, such as matching a sort bare domain name, a FP 
may be very likely without anchors.


best practice is to use the anchors when you can, ie. what 
you're matching will always be at the beginning and/or end of 
the input string.   Never use ^.* or .*$.



  -- Noel Jones


Re: Kernel Oops

2011-03-07 Thread Stan Hoeppner
Noel Jones put forth on 3/7/2011 9:49 AM:
 On 3/7/2011 8:13 AM, Stan Hoeppner wrote:
 Noel Jones put forth on 3/7/2011 7:00 AM:
 On 3/7/2011 4:47 AM, Stan Hoeppner wrote:

 I was taught to always start my expressions with /^ and end them with
 $/.  Why did Steven teach me to do this if it's not necessary?

 That's good advice when you're actually matching something.

 Ok, so if I'm doing what I've heard called a fully qualified regular
 expression, WRT FQrDNS matching, should I use the anchors or not?
 postmap -q says these all work (the actuals with action and text that
 is).

 /^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/
 /^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/
 /^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/
 /^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/
 /^(\d{1,3}\.){4}dynamic\.snap\.net\.nz$/
 /^pppoe-dyn(-\d{1,3}){4}\.kosnet\.ru$/
 
 In these examples, you're explicitly matching something at the start
 and/or end of the string.  Using the anchors is correct and recommended.
 
 

 The special case of .* means, as you know, anything or nothing.
 There's never a case where it's necessary to explicitly match a leading
 or trailing anything or nothing.

 What of the case where you want to match something in the middle of the
 input string, with extra junk on both ends?
 
 If you're looking for a string that contains foo anywhere, simply
 /foo/
 with no anchors.
 
 

 Consider:
 /^.*foo$/
match the string beginning with anything or nothing, ending with foo.

 can always be simplified to:
 /foo$/
match the string ending with foo.

 This works the same without the ending $ anchor (contains foo, rather
 than ends with foo), but helps the illustration.

 So, in my examples above, given we're matching rDNS patterns, are the
 anchors necessary, or helpful?  If not using them means contains, then
 they should still match.  What advantage is there to using the anchors
 when matching rDNS patterns?  Any?
 
 You use anchors to reduce the chance of a false positive.  A side
 benefit is improved performance.
 
 Any pattern that matches with the anchors will still match without the
 anchors, but may match additional input that you don't intend to match. 
 In the case of the rDNS patterns, a FP is unlikely (but possible, more
 so with the shorter patterns).
 
 In other cases, such as matching a sort bare domain name, a FP may be
 very likely without anchors.
 
 best practice is to use the anchors when you can, ie. what you're
 matching will always be at the beginning and/or end of the input
 string.   Never use ^.* or .*$.

Excellent explanations.  Thank you Noel.

-- 
Stan


Re: Kernel Oops

2011-03-07 Thread mouss
Le 07/03/2011 11:47, Stan Hoeppner a écrit :
 mouss put forth on 3/6/2011 7:03 PM:
 
 /^.*foo/
 means it starts with something followed by foo. and this is the same
 thing as it contains foo, which is represented by
 /foo/
 
 I was taught to always start my expressions with /^ and end them with
 $/.  Why did Steven teach me to do this if it's not necessary?  Steven
 being the author of the Enemies List:  http://enemieslist.com/ which
 contains over 65,000 regexes matching FQrDNS patterns.
 
 well, you know I know these:) we all got spam from these...
 
 As with most/all dynamic ranges.
 
 1) first use IP ranges.
 2) then domains (hash/cdb)
 for example:
 .alshamil.net.ae REJECT blah blah
 because there is no point to try to match something like 
  auh-b113917.alshamil.net.ae

 3) then use regular expressions, but only when IPs and domains aren't
 the way to go.
 
 Well, you know I know these mouss. :)  

yes, but we're talking on a public list, so it's good to say it all.
coz' all this stuff is archived and used in way we can't imagine.

 Have ever been locked in a
 certain train of thought and simply forgot to consider something
 related, later putting hand to forehead and saying Duh!.  My mindset
 was focused on showing how a single PCRE can block the same number of
 hosts as using IP addresses in a CIDR or hash table.  I just didn't
 consider the domain blocking aspect of hash tables at the time.  That's
 the Duh!.  I've been blocking domains with my hash table for something
 like 6 years now...  I think some folks call this a brain fart.  ;)
 
 no. IPs and domains are different things.

 cidr is about IPs. hash/cdb/pcre is about names. these are different
 things and you know that. use each as appropriate.
 
 Of course.  But IPs are valid in a hash table.  You can even list them
 by the equivalent of a /24, /16, and /8 if you like, simply by omitting
 the last 1, 2, or 3 octets of the dotted quad.  Just as I brain farted
 WRT using domains in a hash table, it appears you have done the same WRT
 to using IP addresses in a hash table. :)
 

not really. I never put IPs in hash tables. more precisely, I never mix
domains and IPs. be it just for the fact that postfix first looks up
domains/hostnames before looking up IPs, which is the opposite of what I
want. the /24, /16, /8 in postfix is a sendmail compat thing.
something I don't need.

 I agree it makes more sense to block domains with hash/cdb and IPs with
 CIDR.  I've been doing exactly that for 5 of the 6 years I've been
 running Postfix.  The first year (maybe less) I blocked IPs with a hash
 table, until I joined this list and learned about CIDR tables.  I'm
 guessing most other new Postfix OPs go through the same
 progression--most beginners docs returned via Google teach the hash
 table and nothing else.
 
 if the ISP makes it too much, then you should reduce it:
 .embarqhsd.net   REJECT blah blah
 
 Yeah, but then you end up potentially blocking large numbers of ham
 servers in SOHO land, in this case *.sta.embarqhsd.net.  Even in 2011
 there are still hundreds of thousands or more SOHO MTAs on static IP
 aDSL and cable circuits with generic rDNS.  I should know as I'm one of
 them.  (Please let's not allow this to turn into yet another flame war
 WRT generic rDNS, real OPs rent a VPS/colo, yada yada--I'm not directing
 this at you mouss but to those predisposed to flog this dead, stripped
 to the bone, horse carcass).

believe it or not, I have nothing against dynamic IPs. my approach is
as follows:
- whitelisted IPs get whitelisted. this includes public whitelists and
local whitelists
- I do not include an expression for generic rdns until I get spam
- after N spam, I add an expression. well, I do check if it's ok to add
a blocking rule
- I do not care if it's static, .sta or whatever. as I said above,
it's not about dynamic, it's about accountability. if I get spam from
joe.example, I know I can complain to (abuse|postmaster)@joe.example. if
I get junk from 1.2.3.4.largeisp.example, I know I have no right to
complain, because I'm not part of the money circuit.

 
 a better example would be
 /(\W\d+){4}\..*\.embarqhsd\.net$/  REJECT ...

 Better in what way? 

 in the sense that this can't be represented using hash or the like.
 
 Ok.  So you're not showing this PCRE above because it better matches the
 target rDNS string, or that the engine executes it faster or something,
 etc.  You're simply saying don't use a PCRE for something you can match
 using a simpler table, such as hash/cdb.  Correct?
 

yep. but that said, if you don't have performance problems, using a
single map is probably better than splitting it into a pcre and a
has/cdb map. so what I said doesn't apply to _you_. it was about the
example (showing a better example).


regex anchoring (Was: Kernel Oops)

2011-03-07 Thread mouss
Le 07/03/2011 11:47, Stan Hoeppner a écrit :
 mouss put forth on 3/6/2011 7:03 PM:
 
 /^.*foo/
 means it starts with something followed by foo. and this is the same
 thing as it contains foo, which is represented by
 /foo/
 
 I was taught to always start my expressions with /^ and end them with
 $/.  Why did Steven teach me to do this if it's not necessary?  Steven
 being the author of the Enemies List:  http://enemieslist.com/ which
 contains over 65,000 regexes matching FQrDNS patterns.
 

You misunderstood what Steven meant. what Stevens meant is to avoid
things like
/adsl/  REJECT blah

so he recommends anchoring expressions, right and left:
/^cpe\..*\.joe\.example$/   ...

contrast this with
/^cpe/  ...
and
/adsl/  ...

which could match a lot of places you wouldn't want to match.

/^.*foo/ means: starts with anything followed by foo. this is the same
as contains foo, which can be represented by /foo/

and

/foo.*$/ means contains foo followed by anything. this is the same as
contains foo, which can be represented by /foo/


of course, I appreciate Steven and I agree with what he says here, to
some extent (obviously, I'm paid by my employer so it's easy for me to
push for freely available stuff).


 [snip]


Re: Kernel Oops

2011-03-07 Thread mouss
Le 07/03/2011 15:13, Stan Hoeppner a écrit :
 Noel Jones put forth on 3/7/2011 7:00 AM:
 On 3/7/2011 4:47 AM, Stan Hoeppner wrote:

 I was taught to always start my expressions with /^ and end them with
 $/.  Why did Steven teach me to do this if it's not necessary?

 That's good advice when you're actually matching something.
 
 Ok, so if I'm doing what I've heard called a fully qualified regular
 expression, WRT FQrDNS matching, should I use the anchors or not?
 postmap -q says these all work (the actuals with action and text that is).
 
 /^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/

.dynamic.chello.sk  REJECT blah blah


 /^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/

.dyn.forthnet.grREJECT blah blah

 /^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/
/dyn\.4u.com\.gh$/  REJECT blah

assuming you get real mail from there. otherwise
.4u.com.gh  REJECT blah

 /^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/

ahem? I fail to see what yoy're trying to match here. \d is a \w, so
[\d\w] is the same as \w. do you mean \W (capital letter)? anyway:

.dynamic.ziggo.nlREJECT blah blah

 /^(\d{1,3}\.){4}dynamic\.snap\.net\.nz$/
.dynamic.snap.net.nzREJECT blah

 /^pppoe-dyn(-\d{1,3}){4}\.kosnet\.ru$/
/\Wdyn\W.*\.kosnet\.ru$/REJECT blah

 
 The special case of .* means, as you know, anything or nothing. 
 There's never a case where it's necessary to explicitly match a leading
 or trailing anything or nothing.
 
 What of the case where you want to match something in the middle of the
 input string, with extra junk on both ends?

well, that's what regular expressions are about by default:
/foo/ means contains foo
/^foo/ means starts with foo
/foo$/ means ends with foo

so
/^bart.*homer.*marge$/ means: starts with bart, ends with marge and
somewhere between these contains homer.


 
 Consider:
 /^.*foo$/
   match the string beginning with anything or nothing, ending with foo.

 can always be simplified to:
 /foo$/
   match the string ending with foo.

 This works the same without the ending $ anchor (contains foo, rather
 than ends with foo), but helps the illustration.
 
 So, in my examples above, given we're matching rDNS patterns, are the
 anchors necessary, or helpful?  If not using them means contains, then
 they should still match.  What advantage is there to using the anchors
 when matching rDNS patterns?  Any?
 
 (In the other special case where you're using $1, $2, etc. substitution
 in the result, you might need some form of /^(.*foo)$/ to fill the
 substitution buffer, but that's about substitution, not about matching.)
 
 Thank you for the continuing PCRE education Noel, and Ansgar. :)
 



Re: Kernel Oops

2011-03-07 Thread fakessh @
it is necessary to consider the option

parent_domain_matches_subdomains =

Le mardi 08 mars 2011 à 00:45 +0100, mouss a écrit :
 Le 07/03/2011 15:13, Stan Hoeppner a écrit :
  Noel Jones put forth on 3/7/2011 7:00 AM:
  On 3/7/2011 4:47 AM, Stan Hoeppner wrote:
 
  I was taught to always start my expressions with /^ and end them with
  $/.  Why did Steven teach me to do this if it's not necessary?
 
  That's good advice when you're actually matching something.
  
  Ok, so if I'm doing what I've heard called a fully qualified regular
  expression, WRT FQrDNS matching, should I use the anchors or not?
  postmap -q says these all work (the actuals with action and text that is).
  
  /^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/
 
 .dynamic.chello.skREJECT blah blah
 
 
  /^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/
 
 .dyn.forthnet.gr  REJECT blah blah
 
  /^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/
 /dyn\.4u.com\.gh$/REJECT blah
 
 assuming you get real mail from there. otherwise
 .4u.com.ghREJECT blah
 
  /^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/
 
 ahem? I fail to see what yoy're trying to match here. \d is a \w, so
 [\d\w] is the same as \w. do you mean \W (capital letter)? anyway:
 
 .dynamic.ziggo.nl  REJECT blah blah
 
  /^(\d{1,3}\.){4}dynamic\.snap\.net\.nz$/
 .dynamic.snap.net.nz  REJECT blah
 
  /^pppoe-dyn(-\d{1,3}){4}\.kosnet\.ru$/
 /\Wdyn\W.*\.kosnet\.ru$/  REJECT blah
 
  
  The special case of .* means, as you know, anything or nothing. 
  There's never a case where it's necessary to explicitly match a leading
  or trailing anything or nothing.
  
  What of the case where you want to match something in the middle of the
  input string, with extra junk on both ends?
 
 well, that's what regular expressions are about by default:
 /foo/ means contains foo
 /^foo/ means starts with foo
 /foo$/ means ends with foo
 
 so
 /^bart.*homer.*marge$/ means: starts with bart, ends with marge and
 somewhere between these contains homer.
 
 
  
  Consider:
  /^.*foo$/
match the string beginning with anything or nothing, ending with foo.
 
  can always be simplified to:
  /foo$/
match the string ending with foo.
 
  This works the same without the ending $ anchor (contains foo, rather
  than ends with foo), but helps the illustration.
  
  So, in my examples above, given we're matching rDNS patterns, are the
  anchors necessary, or helpful?  If not using them means contains, then
  they should still match.  What advantage is there to using the anchors
  when matching rDNS patterns?  Any?
  
  (In the other special case where you're using $1, $2, etc. substitution
  in the result, you might need some form of /^(.*foo)$/ to fill the
  substitution buffer, but that's about substitution, not about matching.)
  
  Thank you for the continuing PCRE education Noel, and Ansgar. :)
  
 
-- 
gpg --keyserver pgp.mit.edu --recv-key 092164A7
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x092164A7


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: Kernel Oops

2011-03-06 Thread Bastian Blank
On Fri, Mar 04, 2011 at 03:43:11PM +0300, Denis Shulyaka wrote:
 Mar  4 14:46:29 shulyaka kern.alert kernel: CPU 0 Unable to handle
 kernel paging request at virtual address 0050, epc == 800fbdb4, ra
 == 800fbdf8

This kernel is broken bejond repair. Get a fixed one.

 Mar  4 14:46:29 shulyaka kern.warn kernel: Tainted: G  D

This is _not_ the first oops in the log.

Bastian

-- 
Emotions are alien to me.  I'm a scientist.
-- Spock, This Side of Paradise, stardate 3417.3


Re: Kernel Oops

2011-03-06 Thread Victor Duchovni
On Sat, Mar 05, 2011 at 06:24:57PM +0300, Denis Shulyaka wrote:

 If I pass change `fsspace(., fsbuf);' to `fsspace(/, fsbuf);' it
 works, no oopses, and the messages are received without problems. I
 will make some stress tests later.
 
 So the remaining question is what . in smtpd context mean? Is it the
 dir postfix has been started from?

Services spawned from master.cf run with cwd == $queue_directory
(typically /var/spool/postfix).

-- 
Viktor.


Re: Kernel Oops

2011-03-06 Thread Denis Shulyaka
Hi Viktor,

You are right, for some reason my system has some troubles with
fsspace(/var/spool/postfix, fsbuf). Possibly, Bastian is right
about my kernel. But I just don't how to fix it.

Any way, Postfix code is OK, and the workaround with
`fsspace(/overlay, fsbuf)` satisfies me so far.


Best regards,
Denis Shulyaka

2011/3/6 Victor Duchovni victor.ducho...@morganstanley.com:
 On Sat, Mar 05, 2011 at 06:24:57PM +0300, Denis Shulyaka wrote:

 If I pass change `fsspace(., fsbuf);' to `fsspace(/, fsbuf);' it
 works, no oopses, and the messages are received without problems. I
 will make some stress tests later.

 So the remaining question is what . in smtpd context mean? Is it the
 dir postfix has been started from?

 Services spawned from master.cf run with cwd == $queue_directory
 (typically /var/spool/postfix).

 --
        Viktor.



Re: Kernel Oops

2011-03-06 Thread Denis Shulyaka
Hi Viktor,

I have tried both statfs() and statvfs() and it shows the similar behaivour.

2011/3/6 Victor Duchovni victor.ducho...@morganstanley.com:
 The fsspace function is a Postfix utility function, the underlying
 system interface is either statfs() or statvfs(). You should find
 out which is used on your system and test that...

 --
        Viktor.



Re: Kernel Oops

2011-03-06 Thread Wietse Venema
Victor Duchovni:
  The fsspace function is a Postfix utility function, the underlying
  system interface is either statfs() or statvfs(). You should find
  out which is used on your system and test that...

Denis Shulyaka:
 I have tried both statfs() and statvfs() and it shows the similar behaivour.

Postfix uses statfs/statvfs as part of a safety net. If you delete
the call, then Postfix would waste more bandwidth receiving mail
that it can't store.

However, if statfs/statvfs are broken, then there are likely to be
more problems. I would recommend against using the file system for
the email queue.

Wietse


Re: Kernel Oops

2011-03-06 Thread Stan Hoeppner
Wietse Venema put forth on 3/6/2011 3:29 PM:

 Postfix uses statfs/statvfs as part of a safety net. If you delete
 the call, then Postfix would waste more bandwidth receiving mail
 that it can't store.
 
 However, if statfs/statvfs are broken, then there are likely to be
 more problems. 


 I would recommend against using the file system for
 the email queue.
^

What?!?!?  What?!  Seeing you state this Wietse prompts me to run for
the bomb shelter, for the world as we know it will soon end. :)

Would that not make his only other option, assuming he sticks with his
current kernel, a ramdisk?  In a scenario where the target machine has
only 64MB RAM?  And considering you've expended countless keystrokes
over the years telling OPs to _never_ _ever_ put the queue on a ramdisk?

Or, are you suggesting, in a creative Wietse'esque dead pan humorous
way, that he fix the problem with his current kernel, as I did far back
in this thread, and others have since?

-- 
Stan


Re: Kernel Oops

2011-03-06 Thread Wietse Venema
Wietse:
 However, if statfs/statvfs are broken, then there are likely to be
 more problems. 
 
 I would recommend against using the file system for
 the email queue.

Instead, use a better file system.

Wietse


Re: Kernel Oops

2011-03-05 Thread mouss
Le 05/03/2011 00:18, Stan Hoeppner a écrit :
 lst_ho...@kwsoft.de put forth on 3/4/2011 3:33 PM:
 
 BTW, is there any how-to for getting the least possible memory
 footprint for Postfix.
 
 - don't use regex/pcre maps
 
 This isn't necessarily true, is it?  In some cases I would think it's
 dramatically reversed in favor of PCRE tables (unless the Postfix PCRE
 processing code overhead eats up a massive amount of memory).  For
 example, with the following single PCRE I can block a few million,
 literally, residential hosts in the Centurylink (formerly Embarq)
 consumer broadband aDSL network:
 
 /^.*\.(dyn|dhcp)\.embarqhsd\.net$/  REJECT Please use ISP relay
 

you can simplify that:
/\.(dyn|dhcp)\.embarqhsd\.net$/  REJECT Please use ISP relay

more generally /^.* is never needed.

anyway, this example is too simple and can be replaced with 2 cdb entries:
.dyn.embarqshd.net  REJECT ...
.dhcp.embarqshd.net REJECT ...

a better example would be
/(\W\d+){4}\..*\.embarqhsd\.net$/   REJECT ...


 To do this with a CIDR would take at least 100 entries to cover all the
 subnets, probably many many more, due to the way they assign blocks by
 state, and rDNS by customer type, with (dyn|dhcp|sta) all existing
 within each of the top level parents.
 
 To do this with a hash table would require multiple hundreds of entries
 as you'd be limited to using /24s.
 



Re: Kernel Oops

2011-03-05 Thread Denis Shulyaka
Hi all,

I have investigated the problem a little, and here are some results:

First of all, it has nothing to do with memory consumption. The smtpd
crashes on statfs() in fsspase() function, which is called from
smtpd_check_queue() to check available free space on current
filesystem for a queue.

In the suggested System.map file the closest entry is 'alloc_page_buffers'.

The default_process_limit, qmgr_message_active_limit and
qmgr_message_recipient_limit tweaks have no effect at all.

Any thoughts why statfs() may trigger a kernel oops?


Best regards,
Denis Shulyaka

2011/3/4 Wietse Venema wie...@porcupine.org:
 Wietse:
  Postfix asks the kernel for memory. If the kernel oopses and crashes
  Postfix, then that can't be fixed by changing Postfix.

 Denis Shulyaka:
 How much memory does smtpd need to receive a message, approximately?
 Can I tweak this value somehow?

 First, you can't run Postfix on a kernel that oopses and sends
 signal 11 when Postfix asks for memory. It should report the
 memory shortage to Postfix instead.

 The amount of memory depends on libc, and on what else you linked
 into Postfix: OpenSSL, PCRE, LDAP, and so on quickly add up to the
 memory footprint.

 The biggest tweak is reducing default_process_limit by a factor 10
 or more. Other tweaks are reducing qmgr_message_active_limit and
 qmgr_message_recipient_limit by a factor 10 or more.

        Wietse



Re: Kernel Oops

2011-03-05 Thread Denis Shulyaka
Well, I found it!

If I pass change `fsspace(., fsbuf);' to `fsspace(/, fsbuf);' it
works, no oopses, and the messages are received without problems. I
will make some stress tests later.

So the remaining question is what . in smtpd context mean? Is it the
dir postfix has been started from?


2011/3/5 Denis Shulyaka shuly...@gmail.com:
 Hi all,

 I have investigated the problem a little, and here are some results:

 First of all, it has nothing to do with memory consumption. The smtpd
 crashes on statfs() in fsspase() function, which is called from
 smtpd_check_queue() to check available free space on current
 filesystem for a queue.

 In the suggested System.map file the closest entry is 'alloc_page_buffers'.

 The default_process_limit, qmgr_message_active_limit and
 qmgr_message_recipient_limit tweaks have no effect at all.

 Any thoughts why statfs() may trigger a kernel oops?


 Best regards,
 Denis Shulyaka


Re: Kernel Oops

2011-03-05 Thread Stan Hoeppner
mouss put forth on 3/5/2011 7:20 AM:
 Le 05/03/2011 00:18, Stan Hoeppner a écrit :

 /^.*\.(dyn|dhcp)\.embarqhsd\.net$/  REJECT Please use ISP relay


 you can simplify that:
 /\.(dyn|dhcp)\.embarqhsd\.net$/  REJECT Please use ISP relay
 
 more generally /^.* is never needed.

Does this expression correctly match a longer string when used as:
check_reverse_client_hostname_access pcre:/etc/postfix/foo.pcre

The actual FQrDNS strings in my example network will be of the form:
fl-65-40-2-201.dyn.embarqhsd.net
tx-67-232-101-101.dhcp.embarqhsd.net

I was of the impression that a preceding wild card is required if not
using fully qualified expressions, but simply trying to match only a
substring at the back end of the line.

 anyway, this example is too simple and can be replaced with 2 cdb entries:
 .dyn.embarqshd.netREJECT ...
 .dhcp.embarqshd.net   REJECT ...

I just realized I erred in my original thought process leading to my
example.  I started out thinking of banning blocks of IPs, and how using
a PCRE matching rDNS patterns can shrink an equivalent IP subnet hash
table or CIDR table dramatically.  I was strictly thinking of a hash
table full of IP subnets.  For some reason using host names in a hash
table slipped my mind (hand to forehead).  One could just as easily do
this with hash table.  So yes, this wasn't the greatest example.  A
better example would have been an ISP that uses goofy multiple rDNS
conventions, possibly due to mergers, etc, such as:

10-1-2-3.dhcp.[state-abbr].isp.net
10-2-3-4.dyn.[city-name].isp.net
10-3-4-5.res.[state-abbr].isp.net
10-4-5-6-dynamic.[city-name].isp.net
etc

A PCRE table would definitely have a smaller memory footprint (the
current thread focus) in this example than an equivalent hash or cdb
table.  And doing this with a CIDR would likely be smaller than hash or
cdb as well, given the number of cities and states that such as ISP
would be operating in, which would kick the total number of rDNS
patterns into the hundreds.

 a better example would be
 /(\W\d+){4}\..*\.embarqhsd\.net$/ REJECT ...

Better in what way?  Does this get processed using significantly less
cycles or with significant memory footprint savings?  Your example is
incomprehensible to non regex experts (myself included).  I had to hit
my regex docs to understand this syntax choice.  Non experts at least
have a fighting chance at deciphering my original example mouss. :)

Thanks in advance for the anticipated forthcoming regex education.

-- 
Stan


Re: Kernel Oops

2011-03-05 Thread Noel Jones

On 3/5/2011 9:32 AM, Stan Hoeppner wrote:

mouss put forth on 3/5/2011 7:20 AM:

Le 05/03/2011 00:18, Stan Hoeppner a écrit :



/^.*\.(dyn|dhcp)\.embarqhsd\.net$/  REJECT Please use ISP relay




you can simplify that:
/\.(dyn|dhcp)\.embarqhsd\.net$/  REJECT Please use ISP relay

more generally /^.* is never needed.


Does this expression correctly match a longer string when used as:
check_reverse_client_hostname_access pcre:/etc/postfix/foo.pcre


(Why would the string be longer?)
Regardless, it's not required to anchor the beginning because 
it's anchored at the end.




The actual FQrDNS strings in my example network will be of the form:
fl-65-40-2-201.dyn.embarqhsd.net
tx-67-232-101-101.dhcp.embarqhsd.net

I was of the impression that a preceding wild card is required if not
using fully qualified expressions, but simply trying to match only a
substring at the back end of the line.


A wildcard anchored to the beginning (or the end) is always 
useless -- think about it a minute and you'll see why.





a better example would be
/(\W\d+){4}\..*\.embarqhsd\.net$/   REJECT ...


Better in what way?


This example shows something that would be impossible to 
reproduce in a hash/cdb table.



  -- Noel Jones



Kernel Oops

2011-03-04 Thread Denis Shulyaka
Hi list!

I'm trying to run postfix on my OpenWrt system. I have successfully
compiled it and now I can send mails, but when I try to receive a
mail, smtpd crashes and I can see this in the system log:

Mar  4 14:46:29 shulyaka mail.info postfix/smtpd[18020]: connect from
mail-bw0-f52.google.com[209.85.214.52]
Mar  4 14:46:29 shulyaka kern.alert kernel: CPU 0 Unable to handle
kernel paging request at virtual address 0050, epc == 800fbdb4, ra
== 800fbdf8
Mar  4 14:46:29 shulyaka mail.warn postfix/master[16781]: warning:
process /usr/libexec/postfix/smtpd pid 18020 killed by signal 11
Mar  4 14:46:29 shulyaka mail.warn postfix/master[16781]: warning:
/usr/libexec/postfix/smtpd: bad command startup -- throttling
Mar  4 14:46:29 shulyaka kern.warn kernel: Oops[#23]:
Mar  4 14:46:29 shulyaka kern.warn kernel: Cpu 0
Mar  4 14:46:29 shulyaka kern.warn kernel: $ 0   :  0001
820b3280 8012c43c
Mar  4 14:46:29 shulyaka kern.warn kernel: $ 4   :  810c7e60
 
Mar  4 14:46:29 shulyaka kern.warn kernel: $ 8   : 0018 800643f8
802f fff4
Mar  4 14:46:29 shulyaka kern.warn kernel: $12   : f000 0001
0400 0043c994
Mar  4 14:46:29 shulyaka kern.warn kernel: $16   : 810c7e60 83577580
0003 7fcf9ec8
Mar  4 14:46:29 shulyaka kern.warn kernel: $20   : 0003 00409740
0046eaf0 004560a0
Mar  4 14:46:29 shulyaka kern.warn kernel: $24   : 0070 
Mar  4 14:46:29 shulyaka kern.warn kernel: $28   : 810c6000 810c7df0
0047 800fbdf8
Mar  4 14:46:29 shulyaka kern.warn kernel: Hi: 03b8
Mar  4 14:46:29 shulyaka kern.warn kernel: Lo: 0001e74d
Mar  4 14:46:29 shulyaka kern.warn kernel: epc   : 800fbdb4 0x800fbdb4
Mar  4 14:46:29 shulyaka kern.warn kernel: Tainted: G  D
Mar  4 14:46:29 shulyaka kern.warn kernel: ra: 800fbdf8 0x800fbdf8
Mar  4 14:46:29 shulyaka kern.warn kernel: Status: 1000fc03KERNEL EXL IE
Mar  4 14:46:29 shulyaka kern.warn kernel: Cause : 0088
Mar  4 14:46:29 shulyaka kern.warn kernel: BadVA : 0050
Mar  4 14:46:29 shulyaka kern.warn kernel: PrId  : 00019374 (MIPS 24Kc)
Mar  4 14:46:29 shulyaka kern.warn kernel: [truncated] Modules linked
in: ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200
ums_freecom sch_red sch_sfq ums_datafab sch_hfsc ums_cypress cls_fw
ums_alauda sch_ingress act_mirred act_connmark em_u32 ledtrig_u
Mar  4 14:46:29 shulyaka kern.warn kernel: Process smtpd (pid: 18020,
threadinfo=810c6000, task=82c49dc0, tls=2b7cb2f0)
Mar  4 14:46:29 shulyaka kern.warn kernel: Stack : 82016dc0 0001
83b35480 83577580  810c7e60 83577580 800fbdf8
Mar  4 14:46:29 shulyaka kern.warn kernel: 80a5e000 800e4544
7fcf9ec8 800e36f8  810c7ed0 810c7e60 800fbe48
Mar  4 14:46:29 shulyaka kern.warn kernel:  80a5e000
80a5e000 800e4928 80c48300 810c7ed8 00453af8 800fbfb4
Mar  4 14:46:29 shulyaka kern.warn kernel: 83b35480 83577580
0001 82016dc0    
Mar  4 14:46:29 shulyaka kern.warn kernel:  
     
Mar  4 14:46:29 shulyaka kern.warn kernel: ...
Mar  4 14:46:29 shulyaka kern.warn kernel: Call Trace:[800fbdf8] 0x800fbdf8
Mar  4 14:46:29 shulyaka kern.warn kernel: [800e4544] 0x800e4544
Mar  4 14:46:29 shulyaka kern.warn kernel: [800e36f8] 0x800e36f8
Mar  4 14:46:29 shulyaka kern.warn kernel: [800fbe48] 0x800fbe48
Mar  4 14:46:29 shulyaka kern.warn kernel: [800e4928] 0x800e4928
Mar  4 14:46:29 shulyaka kern.warn kernel: [800fbfb4] 0x800fbfb4
Mar  4 14:46:29 shulyaka kern.warn kernel: [800fc0dc] 0x800fc0dc
Mar  4 14:46:29 shulyaka kern.warn kernel: [8009d0c8] 0x8009d0c8
Mar  4 14:46:29 shulyaka kern.warn kernel: [800d9c84] 0x800d9c84
Mar  4 14:46:29 shulyaka kern.warn kernel: [80081744] 0x80081744
Mar  4 14:46:29 shulyaka kern.warn kernel: [80062544] 0x80062544
Mar  4 14:46:29 shulyaka kern.warn kernel: Code: afb10018  afb00014
afbf001c 8c820050 00808821  00a08021  8c420024  8c43002c  10600012

This happens every time I receive a mail.
I also tried to telnet to the smtp port and found out that postfix
correctly responds to HELO and crashes right after I send MAIL
command.

Besides that, the whole system is very stable, so I don't believe it
is a hardware fault.

Postfix version 2.8.0
# uname -r
2.6.37.1
# uname -m
mips
# free
  total used free   shared  buffers
  Mem:6204048348136920 5916
 Swap:   5242840   524284
Total:   58632448348   537976

Best regards,
Denis Shulyaka


Re: Kernel Oops

2011-03-04 Thread Ralf Hildebrandt
* Denis Shulyaka shuly...@gmail.com:
 Hi list!
 
 I'm trying to run postfix on my OpenWrt system. I have successfully
 compiled it and now I can send mails, but when I try to receive a
 mail, smtpd crashes and I can see this in the system log:
 
 Mar  4 14:46:29 shulyaka mail.info postfix/smtpd[18020]: connect from 
 mail-bw0-f52.google.com[209.85.214.52]
 Mar  4 14:46:29 shulyaka kern.alert kernel: CPU 0 Unable to handle kernel 
 paging request at virtual address 0050, epc == 800fbdb4, ra == 800fbdf8
 Mar  4 14:46:29 shulyaka mail.warn postfix/master[16781]: warning: process 
 /usr/libexec/postfix/smtpd pid 18020 killed by signal 11
 Mar  4 14:46:29 shulyaka mail.warn postfix/master[16781]: warning: 
 /usr/libexec/postfix/smtpd: bad command startup -- throttling

Sounds like you run out of memory.
But let's see what the others say...

 # free
   total used free   shared  buffers
   Mem:6204048348136920 5916
  Swap:   5242840   524284
 Total:   58632448348   537976
 
 Best regards,
 Denis Shulyaka

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: Kernel Oops

2011-03-04 Thread john

What hardware are running openwrt on?


Re: Kernel Oops

2011-03-04 Thread Ralf Hildebrandt
* john j...@klam.ca:
 What hardware are running openwrt on?
Sounds like a MIPS based OpenWRT system, e.g. a WRT54g (am I correct?)

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: Kernel Oops

2011-03-04 Thread john

On 04/03/2011 8:58 AM, Denis Shulyaka wrote:

Hi John,

It's D-Link DIR-825 router, CPU Atheros AR7161@680MHz (mips)

2011/3/4 johnj...@klam.ca:

What hardware are running openwrt on?

I think that you are being a little ambitious, that box has 8M flash and 
64M RAM.


All that is necessary for the triumph of evil is that good men do nothing. 
(Edmund Burke)



Re: Kernel Oops

2011-03-04 Thread john
I think you should listen to the advise you were given on the OpenWRT 
developers forum by Philip.



All that is necessary for the triumph of evil is that good men do 
nothing. (Edmund Burke)


Re: Kernel Oops

2011-03-04 Thread Denis Shulyaka
Hi Ralf,

Thanks for the response.
I think 13 Mb should be well enough for receiving a message, and I
also expect some different error message if it is a memory allocation
problem.

2011/3/4 Ralf Hildebrandt ralf.hildebra...@charite.de:
 Sounds like you run out of memory.
 But let's see what the others say...

 # free
               total         used         free       shared      buffers
   Mem:        62040        48348        13692            0         5916
  Swap:       524284            0       524284
 Total:       586324        48348       537976


Re: Kernel Oops

2011-03-04 Thread Wietse Venema
Denis Shulyaka:
 Hi Ralf,
 
 Thanks for the response.
 I think 13 Mb should be well enough for receiving a message, and I
 also expect some different error message if it is a memory allocation
 problem.

Postfix asks the kernel for memory. If the kernel oopses and crashes
Postfix, then that can't be fixed by changing Postfix.

Wietse



Re: Kernel Oops

2011-03-04 Thread Denis Shulyaka
Hi Wietse,

How much memory does smtpd need to receive a message, approximately?
Can I tweak this value somehow?


2011/3/4 Wietse Venema wie...@porcupine.org:
 Denis Shulyaka:
 Hi Ralf,

 Thanks for the response.
 I think 13 Mb should be well enough for receiving a message, and I
 also expect some different error message if it is a memory allocation
 problem.

 Postfix asks the kernel for memory. If the kernel oopses and crashes
 Postfix, then that can't be fixed by changing Postfix.

        Wietse




Re: Kernel Oops

2011-03-04 Thread Denis Shulyaka
Hi John,

I don't agree with Philip, but the only way to prove my point is to
make it running.
I will need to see it myself to believe that 64M RAM + swap is not enough.

2011/3/4 john j...@klam.ca:
 I think you should listen to the advise you were given on the OpenWRT
 developers forum by Philip.


Re: Kernel Oops

2011-03-04 Thread Noel Jones

On 3/4/2011 9:13 AM, Denis Shulyaka wrote:

Hi John,

I don't agree with Philip, but the only way to prove my point is to
make it running.
I will need to see it myself to believe that 64M RAM + swap is not enough.


Things to try:

Don't use any lookup tables.

comment out all unused entries in master.cf.

set in main.cf:
default_process_limit = 1


Even still, I doubt it will work.


  -- Noel Jones


Re: Kernel Oops

2011-03-04 Thread Wietse Venema
Wietse:
  Postfix asks the kernel for memory. If the kernel oopses and crashes
  Postfix, then that can't be fixed by changing Postfix.

Denis Shulyaka:
 How much memory does smtpd need to receive a message, approximately?
 Can I tweak this value somehow?

First, you can't run Postfix on a kernel that oopses and sends
signal 11 when Postfix asks for memory. It should report the 
memory shortage to Postfix instead.

The amount of memory depends on libc, and on what else you linked
into Postfix: OpenSSL, PCRE, LDAP, and so on quickly add up to the
memory footprint.

The biggest tweak is reducing default_process_limit by a factor 10
or more. Other tweaks are reducing qmgr_message_active_limit and
qmgr_message_recipient_limit by a factor 10 or more.

Wietse


Re: Kernel Oops

2011-03-04 Thread Denis Shulyaka
Hi Noel, Wietse,

Thanks! I will try to do this and will update you with the result.

Best regards,
Denis Shulyaka


Re: Kernel Oops

2011-03-04 Thread Wietse Venema
Wietse Venema:
 The biggest tweak is reducing default_process_limit by a factor 10
 or more. Other tweaks are reducing qmgr_message_active_limit and
 qmgr_message_recipient_limit by a factor 10 or more.

And don't use Berkeley DB. Use CDB instead.

Wietse


Re: Kernel Oops

2011-03-04 Thread Steve Jenkins
On Fri, Mar 4, 2011 at 8:01 AM, Denis Shulyaka shuly...@gmail.com wrote:
 Thanks! I will try to do this and will update you with the result.

When I read Denis' first post I thought WHAT? Postfix on a WRT54G? He's crazy!

But now I'm rooting for you, Denis! I hope you get it working! :)

SteveJ


Re: Kernel Oops

2011-03-04 Thread Stan Hoeppner
Ralf Hildebrandt put forth on 3/4/2011 6:53 AM:
 * Denis Shulyaka shuly...@gmail.com:
 Hi list!

 I'm trying to run postfix on my OpenWrt system. I have successfully
 compiled it and now I can send mails, but when I try to receive a
 mail, smtpd crashes and I can see this in the system log:

 Mar  4 14:46:29 shulyaka mail.info postfix/smtpd[18020]: connect from 
 mail-bw0-f52.google.com[209.85.214.52]
 Mar  4 14:46:29 shulyaka kern.alert kernel: CPU 0 Unable to handle kernel 
 paging request at virtual address 0050, epc == 800fbdb4, ra == 800fbdf8
 Mar  4 14:46:29 shulyaka mail.warn postfix/master[16781]: warning: process 
 /usr/libexec/postfix/smtpd pid 18020 killed by signal 11
 Mar  4 14:46:29 shulyaka mail.warn postfix/master[16781]: warning: 
 /usr/libexec/postfix/smtpd: bad command startup -- throttling
 
 Sounds like you run out of memory.
 But let's see what the others say...

AFAIK OOM will throw a different error.  More than likely his problem is
a MIPS kernel compile issue or a problem with his RAM.  Googling Unable
to handle kernel paging request turns up some interesting results, this
one on the first page likely being the most relevant, though 6 years old.

http://www.linux-mips.org/archives/linux-mips/2004-10/msg00314.html

The OP needs to follow the troubleshooting procedure in the above
thread, and if he can't solve it alone, take it up on lkml.

-- 
Stan


Re: Kernel Oops

2011-03-04 Thread Wietse Venema
Steve Jenkins:
 On Fri, Mar 4, 2011 at 8:01 AM, Denis Shulyaka shuly...@gmail.com wrote:
  Thanks! I will try to do this and will update you with the result.
 
 When I read Denis' first post I thought WHAT? Postfix on a WRT54G? He's 
 crazy!
 
 But now I'm rooting for you, Denis! I hope you get it working! :)

+1. It's fun to find out how small Postfix can get.

Postfix has been running since late 1998 on a 64MB box, 24/7.  I
replaced the few parts that break, and blow out the dust once a
year or so.  Good hardware does not die.

Wietse


Re: Kernel Oops

2011-03-04 Thread Daniel Bromberg

On 3/4/2011 2:01 PM, Wietse Venema wrote:

Steve Jenkins:

On Fri, Mar 4, 2011 at 8:01 AM, Denis Shulyakashuly...@gmail.com  wrote:

Thanks! I will try to do this and will update you with the result.

When I read Denis' first post I thought WHAT? Postfix on a WRT54G? He's crazy!

But now I'm rooting for you, Denis! I hope you get it working! :)

+1. It's fun to find out how small Postfix can get.

Postfix has been running since late 1998 on a 64MB box, 24/7.  I
replaced the few parts that break, and blow out the dust once a
year or so.  Good hardware does not die.

Wietse
A cheers from this corner as well. A light  just went on. Did not even 
realize until now the referent was an old fashioned, jailbroken blue-box 
Linksys router. Talk about consolidation! Oh, that's your home router? 
-- No, corporate mailhub. Please, post a detailed blog and link to it 
when you're done!


-Daniel



Re: Kernel Oops

2011-03-04 Thread Denis Shulyaka
Hi Daniel,

Actually it's D-Link DIR 825 with attached USB hard drive, and it's
white and stylish!

2011/3/4 Daniel Bromberg dan...@basezen.com:
 On 3/4/2011 2:01 PM, Wietse Venema wrote:

 Steve Jenkins:

 On Fri, Mar 4, 2011 at 8:01 AM, Denis Shulyakashuly...@gmail.com
  wrote:

 Thanks! I will try to do this and will update you with the result.

 When I read Denis' first post I thought WHAT? Postfix on a WRT54G? He's
 crazy!

 But now I'm rooting for you, Denis! I hope you get it working! :)

 +1. It's fun to find out how small Postfix can get.

 Postfix has been running since late 1998 on a 64MB box, 24/7.  I
 replaced the few parts that break, and blow out the dust once a
 year or so.  Good hardware does not die.

        Wietse

 A cheers from this corner as well. A light  just went on. Did not even
 realize until now the referent was an old fashioned, jailbroken blue-box
 Linksys router. Talk about consolidation! Oh, that's your home router? --
 No, corporate mailhub. Please, post a detailed blog and link to it when
 you're done!

 -Daniel




Re: Kernel Oops

2011-03-04 Thread lst_hoe02

Zitat von Wietse Venema wie...@porcupine.org:


Steve Jenkins:

On Fri, Mar 4, 2011 at 8:01 AM, Denis Shulyaka shuly...@gmail.com wrote:
 Thanks! I will try to do this and will update you with the result.

When I read Denis' first post I thought WHAT? Postfix on a WRT54G?  
He's crazy!


But now I'm rooting for you, Denis! I hope you get it working! :)


+1. It's fun to find out how small Postfix can get.

Postfix has been running since late 1998 on a 64MB box, 24/7.  I
replaced the few parts that break, and blow out the dust once a
year or so.  Good hardware does not die.

Wietse


You must have solid caps, don't you?

BTW, is there any how-to for getting the least possible memory  
footprint for Postfix. As learned some points are
- reduce either the global default process limit or the relevant  
process limits in master.cf
- use a small footprint lookup table like cdb and the least possible  
count of tables

- don't use regex/pcre maps
- reduce active limit for qmgr

any other knobs/screws to adjust?

Many Thanks

Andreas






smime.p7s
Description: S/MIME Cryptographic Signature


Re: Kernel Oops

2011-03-04 Thread Victor Duchovni
On Fri, Mar 04, 2011 at 10:33:30PM +0100, lst_ho...@kwsoft.de wrote:

 BTW, is there any how-to for getting the least possible memory footprint 
 for Postfix. As learned some points are

 - reduce either the global default process limit or the relevant process 
 limits in master.cf
 - use a small footprint lookup table like cdb and the least possible count 
 of tables
 - don't use regex/pcre maps

Nothing wrong with small regexp/pcre maps.

 - reduce active limit for qmgr

 any other knobs/screws to adjust?

Use postscreen, to reduce demand for connections to the real SMTP service.
Potentially compile-in fewer features (TLS, SASL, LDAP, ...), but Berkeley
DB is still needed for dynamic databases (e.g. postscreen dynamic whitelist),
just don't use read-only Berkeley DB tables, use CDB for that.

-- 
Viktor.


Re: Kernel Oops

2011-03-04 Thread Stan Hoeppner
lst_ho...@kwsoft.de put forth on 3/4/2011 3:33 PM:
 Zitat von Wietse Venema wie...@porcupine.org:

 Postfix has been running since late 1998 on a 64MB box, 24/7.  I
 replaced the few parts that break, and blow out the dust once a
 year or so.  Good hardware does not die.

 Wietse
 
 You must have solid caps, don't you?

While film capacitors do have lifespan issues compared to solid
capacitors, they can last 10-20 years if operating at a relatively low
temperature, i.e. sufficient case cooling w/ system in a temp controlled
environment.  One of my personal servers contains an 11 year old Abit
BP6 dual Celery mobo:
http://www.hardwarefreak.com/web/server_pics/gallery/

A couple of caps are mildly bulging but the system is rock solid, even
under burnp6 load on each CPU for 10+ minutes.

-- 
Stan


Re: Kernel Oops

2011-03-04 Thread Stan Hoeppner
lst_ho...@kwsoft.de put forth on 3/4/2011 3:33 PM:

 BTW, is there any how-to for getting the least possible memory
 footprint for Postfix.

 - don't use regex/pcre maps

This isn't necessarily true, is it?  In some cases I would think it's
dramatically reversed in favor of PCRE tables (unless the Postfix PCRE
processing code overhead eats up a massive amount of memory).  For
example, with the following single PCRE I can block a few million,
literally, residential hosts in the Centurylink (formerly Embarq)
consumer broadband aDSL network:

/^.*\.(dyn|dhcp)\.embarqhsd\.net$/  REJECT Please use ISP relay

To do this with a CIDR would take at least 100 entries to cover all the
subnets, probably many many more, due to the way they assign blocks by
state, and rDNS by customer type, with (dyn|dhcp|sta) all existing
within each of the top level parents.

To do this with a hash table would require multiple hundreds of entries
as you'd be limited to using /24s.

-- 
Stan