Re: Understanding postscreen timeouts

2014-05-02 Thread Tom Hendrikx
On 05/02/2014 03:15 AM, Alex wrote:
 Hi,
 
 On Thu, May 1, 2014 at 5:38 PM, Wietse Venema wie...@porcupine.org
 mailto:wie...@porcupine.org wrote:
 
 Alex:
  I'm using postfix-2.10.3 with fedora20 and have configured
 postscreen with
  spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
  receiving the following timeout message:
 
  May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog
 reply
  timeout 10s for swl.spamhaus.org http://swl.spamhaus.org
 
 This time limit has unfortunately escaped my attention.  It is not
 yet configurable.
 
 The warning message means that postscreen gives up waiting for the
 DNS lookup result. This is a safety mechanism.
 
  I'm also using a half-dozen RBLs, but they don't all always timeout.
 
 I see occasional timeouts on residential and co-located servers.
 By default the resolver *system library* routines wait 5s before
 retrying; this may be configurable in resolv.conf, but the
 postscreen time limit is still hard-coded.
 
 
 These are both corporate 10mbs dedicated links and I don't think latency
 and/or bandwidth is a problem.
 
 It actually appears swl.spamhaus.org http://swl.spamhaus.org is the
 main problem. It doesn't even resolve when I try to do it manually. This
 was a recommendation I used from this list some time ago. Has something
 changed?

As a feed user of spamhaus, it's easy to see the amount of data that is
actually in the zones. Both DWL and SWL zones are empty, so the
whitelist experiments of spamhaus seem to be either 'on hold' or dead.
Feel free to drop the zones from your setup.

This won't fix dns lookup problems in general though.

Tom



signature.asc
Description: OpenPGP digital signature


Re: Understanding postscreen timeouts

2014-05-02 Thread Wietse Venema
Stan Hoeppner:
  swl.spamhaus.org*-4
  list.dnswl.org=127.[0..255].[0..255].0*-2
  list.dnswl.org=127.[0..255].[0..255].1*-3
  list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
 
 Consolidate these last 3 to something like:
   list.dnswl.org=127.0.[2..14].[2..3]*-4

These three will result in one list.dnswl.org query, just like the
consolidated one. There is no performance difference.

However, there is a correctness difference. The consolidated form
has the same weight 4 for all results, while the original form
has different weights.

Wietse


postscreen_dnsbl_timeout parameter (was: Understanding postscreen timeouts)

2014-05-02 Thread Wietse Venema
Wietse Venema:
 Alex:
  I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
  spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
  receiving the following timeout message:
  
  May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
  timeout 10s for swl.spamhaus.org
 
 This time limit has unfortunately escaped my attention.  It is not
 yet configurable.

Fixed in Postfix 2.12.

Wietse

20140501

Cleanup: postcreen_dnsbl_timeout parameter. Files:
mantools/postlink, proto/postconf.proto, global/mail_params.h,
postscreen/postscreen.c, postscreen/postscreen_dnsbl.c.


Re: Understanding postscreen timeouts

2014-05-02 Thread Stan Hoeppner
On 5/2/2014 6:07 AM, Wietse Venema wrote:
 Stan Hoeppner:
 swl.spamhaus.org*-4
 list.dnswl.org=127.[0..255].[0..255].0*-2
 list.dnswl.org=127.[0..255].[0..255].1*-3
 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4

 Consolidate these last 3 to something like:
  list.dnswl.org=127.0.[2..14].[2..3]*-4
 
 These three will result in one list.dnswl.org query, just like the
 consolidated one. There is no performance difference.

Correct.  The reason for consolidating these is not to reduce queries.

 However, there is a correctness difference. The consolidated form
 has the same weight 4 for all results, while the original form
 has different weights.

The consolidated form gives no score to a 4th octet value of [0..1], but
gives -4 to [2..3].  This is the key difference.

Alex' form and weights are not correct.  And that is why I posted the
link to the return codes.  The second 'octet' is always zero, not a
range.  The 3rd octet has a range of 2-15, and the 4th octet a range of
0-3.  Specifying a range of 0-255 or 2-255 to cover the future may
have the opposite effect, resulting in potential disaster, depending on
how/if/when dnswl changes things.  Such wildcards should not be used.

A value of 15 in the 3rd octet means the sender is an  Email Marketing
Provider.  Most people would never whitelist such senders.  Alex
currently does.  Most people would give no preference to a 4th octet
score of 0 which means no trust.  Alex is giving -2.  And he is giving
-3 to a 4th octet score of 1, low trust.  The recommended scale is
-0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl
scoring.  Using a 4 point scale instead of 100, a 4th octet value of 0
or 1 should be given NO whitelisting preference at all, which is what my
consolidated example does.

Cheers,

Stan


Re: Understanding postscreen timeouts

2014-05-02 Thread Alex
Hi,

On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner s...@hardwarefreak.comwrote:

 On 5/2/2014 6:07 AM, Wietse Venema wrote:
  Stan Hoeppner:
  swl.spamhaus.org*-4
  list.dnswl.org=127.[0..255].[0..255].0*-2
  list.dnswl.org=127.[0..255].[0..255].1*-3
  list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
 
  Consolidate these last 3 to something like:
   list.dnswl.org=127.0.[2..14].[2..3]*-4
 
  These three will result in one list.dnswl.org query, just like the
  consolidated one. There is no performance difference.

 Correct.  The reason for consolidating these is not to reduce queries.

  However, there is a correctness difference. The consolidated form
  has the same weight 4 for all results, while the original form
  has different weights.

 The consolidated form gives no score to a 4th octet value of [0..1], but
 gives -4 to [2..3].  This is the key difference.

 Alex' form and weights are not correct.  And that is why I posted the
 link to the return codes.  The second 'octet' is always zero, not a
 range.  The 3rd octet has a range of 2-15, and the 4th octet a range of
 0-3.  Specifying a range of 0-255 or 2-255 to cover the future may
 have the opposite effect, resulting in potential disaster, depending on
 how/if/when dnswl changes things.  Such wildcards should not be used.

 A value of 15 in the 3rd octet means the sender is an  Email Marketing
 Provider.  Most people would never whitelist such senders.  Alex
 currently does.  Most people would give no preference to a 4th octet
 score of 0 which means no trust.  Alex is giving -2.  And he is giving
 -3 to a 4th octet score of 1, low trust.  The recommended scale is
 -0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl
 scoring.  Using a 4 point scale instead of 100, a 4th octet value of 0
 or 1 should be given NO whitelisting preference at all, which is what my
 consolidated example does.


Somehow your first message to the list on this topic didn't make it to me.
Had to read it in the archives. Anyway, thanks so much. My postscreen
config was generated through a discussion on this list with rob0 some time
ago, as well as his postscreen config (
http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's reading, he
can correct this.

I can't believe I've been whitelisting mass mailers. That's far from what I
would want to be doing. In fact, I'm considering figuring out some
spamassassin rules to better identify them based on the dnswl queries.

Regarding your DNS caching comments, thanks for this too. I hadn't realized
there would be bandwidth savings by having one or two DNS servers that are
queried on the network versus having a local cache on each mail server.
I've always been a bind loyalist, but will consider the powerDNS program if
it doesn't improve.

I've already made the postscreen changes on the systems, and already
noticing fewer DNS queries.

I've also removed swl.spamhaus.org entirely, thanks to a conversation with
spamhaus and comments from Tom Hendrikx about it being discontinued.

Thanks everyone!
Alex


Re: Understanding postscreen timeouts

2014-05-02 Thread /dev/rob0
On Fri, May 02, 2014 at 08:10:18PM -0400, Alex wrote:
 On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner 
 s...@hardwarefreak.comwrote:
  On 5/2/2014 6:07 AM, Wietse Venema wrote:
   Stan Hoeppner:
   swl.spamhaus.org*-4
   list.dnswl.org=127.[0..255].[0..255].0*-2
   list.dnswl.org=127.[0..255].[0..255].1*-3
   list.dnswl.org=127.[0..255].[0..255].[2..255]*-4
  
   Consolidate these last 3 to something like:
list.dnswl.org=127.0.[2..14].[2..3]*-4
  
   These three will result in one list.dnswl.org query, just like 
   the consolidated one. There is no performance difference.
 
  Correct.  The reason for consolidating these is not to reduce 
  queries.
 
   However, there is a correctness difference. The consolidated 
   form has the same weight 4 for all results, while the original 
   form has different weights.
 
  The consolidated form gives no score to a 4th octet value of 
  [0..1], but gives -4 to [2..3].  This is the key difference.
 
  Alex' form and weights are not correct.  And that is why I posted 
  the link to the return codes.  The second 'octet' is always zero, 
  not a range.  The 3rd octet has a range of 2-15, and the 4th 
  octet a range of 0-3.  Specifying a range of 0-255 or 2-255 to 
  cover the future may have the opposite effect, resulting in 
  potential disaster, depending on how/if/when dnswl changes 
  things.  Such wildcards should not be used.

Good point. I thought of this, but did not bother to implement it 
that way. Eventually I will change it.

  A value of 15 in the 3rd octet means the sender is an Email 
  Marketing Provider.  Most people would never whitelist such 
  senders.  Alex currently does.  Most people would give no 
  preference to a 4th octet score of 0 which means no trust.

Well, I whitelist mildly. Do note that this is a whitelist, under 
management by people who, I suppose, don't like spam any more than 
you nor I.

A DNSWL.org return of 127.0.15.0 means an email marketer who is 
nominally trying to limit spam (thus deserving a whitelist entry), 
but who might be doing that well.

A -1 score makes sense. It's not enough to override Zen nor a 
grouping of other DNSBLs, but if that's the only result from 
postscreen_dnsbl_sites, it's enough to bypass the after-220 checks.

  Alex is giving -2.  And he is giving -3 to a 4th octet score of 
  1, low trust.  The recommended scale is -0.1, -1.0, -10, -100, 
  and this is how SpamAssassin handles dnswl scoring.

Yes, I think -1, -2 and -4 make sense. I lump 4th octet 2 and 3 
together because I'm a 2. :) Also, a -4 is going to override any 
borderline DNSBL score. If it doesn't, I expect something to give 
somewhere. In my studies, I found very little overlap between the 
DNSBLs and the DNSWLs.

  Using a 4 point scale instead of 100, a 4th octet value of
  0 or 1 should be given NO whitelisting preference at all,
  which is what my consolidated example does.

But I don't agree with that. Scoring at the content scanning stage 
differs from scoring in postscreen. DNSWL.org assumes that their 
trust level none sites are not actually making money from spam. I 
can't speak for Mathias, but I am pretty sure that he would delist 
ANY known spammer.

 Somehow your first message to the list on this topic didn't make it 
 to me. Had to read it in the archives. Anyway, thanks so much. My 
 postscreen config was generated through a discussion on this list 
 with rob0 some time ago, as well as his postscreen config ( 
 http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's 
 reading, he can correct this.

Hiya! Yes, I remember. BTW, the better link to share is the HTML 
page, http://rob0.nodns4.us/postscreen.html , which has all the 
explanations and warnings.

 I can't believe I've been whitelisting mass mailers. That's far 
 from what I would want to be doing. In fact, I'm considering 
 figuring out some spamassassin rules to better identify them based 
 on the dnswl queries.

If you want to be adventurous (and to violate the DNSWL.org spirit) 
nothing stops you from using 127.0.15.0 with a positive score in 
postscreen ... or even as a reject_rbl_client in smtpd!

I figure these are at worst the gray hats. And why bother giving 
delays with the after-220 tests they will pass anyway? So yes, my 
policy here was considered and deliberate. But looking back, I'll 
agree that a -1 would make more sense than -2.

Stan probably tends to be more aggressive than I am. There's no 
right/wrong to that, it's a choice.

 Regarding your DNS caching comments, thanks for this too. I hadn't 
 realized there would be bandwidth savings by having one or two DNS 
 servers that are queried on the network versus having a local cache 
 on each mail server. I've always been a bind loyalist, but will 
 consider the powerDNS program if it doesn't improve.

I've always been a BIND loyalist too. Now I'm paid to be a BIND 
loyalist. I have nothing against the competition, certainly I can't 
say anything bad 

Understanding postscreen timeouts

2014-05-01 Thread Alex
Hi,

I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
receiving the following timeout message:

May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
timeout 10s for swl.spamhaus.org

This appears to happen during periods of load, but also when the server is
idle. I understand it's possible to increase the timeout, but I would think
10s would be long enough, so didn't want to start doing that. This is also
on multiple hosts on multiple different, unrelated networks.

I'm also using a half-dozen RBLs, but they don't all always timeout.

I'm using a local bind caching server on the hosts that are involved.
Should I consider setting up rbldnsd for this instead? Or is that only for
caching local RBLs only?

What is the result of this timeout? Does postscreen/dnsblog retry, or is
the attempt failed and the mail just passed on?

Here is the relevant postscreen info from my config. Please let me know if
the full config is necessary.

postscreen_access_list = permit_mynetworks,
cidr:/etc/postfix/postscreen_access.cidr
postscreen_blacklist_action = drop
postscreen_dnsbl_action = enforce
postscreen_dnsbl_reply_map =
pcre:$config_directory/postscreen_dnsbl_reply_map.pcre
postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3 b.barracudacentral.org*2
bl.spameatingmonkey.net*2 bl.spamcop.net dnsbl.sorbs.net psbl.surriel.com
bl.mailspike.net swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2
list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org
=127.[0..255].[0..255].[2..255]*-4
postscreen_dnsbl_threshold = 3
postscreen_greet_action = enforce
postscreen_whitelist_interfaces = static:all 172.XX.YY.160/32 64.XX.YY.0/24
67.XX.YY.0/24

Thanks so much,
Alex


Re: Understanding postscreen timeouts

2014-05-01 Thread Wietse Venema
Alex:
 I'm using postfix-2.10.3 with fedora20 and have configured postscreen with
 spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
 receiving the following timeout message:
 
 May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
 timeout 10s for swl.spamhaus.org

This time limit has unfortunately escaped my attention.  It is not
yet configurable.

The warning message means that postscreen gives up waiting for the
DNS lookup result. This is a safety mechanism.

 I'm also using a half-dozen RBLs, but they don't all always timeout.

I see occasional timeouts on residential and co-located servers.
By default the resolver *system library* routines wait 5s before
retrying; this may be configurable in resolv.conf, but the
postscreen time limit is still hard-coded.

Wietse


Re: Understanding postscreen timeouts

2014-05-01 Thread Alex
Hi,

On Thu, May 1, 2014 at 5:38 PM, Wietse Venema wie...@porcupine.org wrote:

 Alex:
  I'm using postfix-2.10.3 with fedora20 and have configured postscreen
 with
  spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally
  receiving the following timeout message:
 
  May  1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply
  timeout 10s for swl.spamhaus.org

 This time limit has unfortunately escaped my attention.  It is not
 yet configurable.

 The warning message means that postscreen gives up waiting for the
 DNS lookup result. This is a safety mechanism.

  I'm also using a half-dozen RBLs, but they don't all always timeout.

 I see occasional timeouts on residential and co-located servers.
 By default the resolver *system library* routines wait 5s before
 retrying; this may be configurable in resolv.conf, but the
 postscreen time limit is still hard-coded.


These are both corporate 10mbs dedicated links and I don't think latency
and/or bandwidth is a problem.

It actually appears swl.spamhaus.org is the main problem. It doesn't even
resolve when I try to do it manually. This was a recommendation I used from
this list some time ago. Has something changed? This is my current config:

postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3
b.barracudacentral.org*2
bl.spameatingmonkey.net*2
bl.spamcop.net
dnsbl.sorbs.net
psbl.surriel.com
bl.mailspike.net
swl.spamhaus.org*-4
list.dnswl.org=127.[0..255].[0..255].0*-2
list.dnswl.org=127.[0..255].[0..255].1*-3
list.dnswl.org=127.[0..255].[0..255].[2..255]*-4

I'm also curious what resolvers people are using for their mail servers?
bind? Looking at my query graphs, it appears to be about 30 queries/sec on
average for each host, just as a local caching server.

Thanks,
Alex


Re: Understanding postscreen timeouts

2014-05-01 Thread Stan Hoeppner
On 5/1/2014 8:15 PM, Alex wrote:
...
 These are both corporate 10mbs dedicated links and I don't think latency
 and/or bandwidth is a problem.

The problem, if network related, will be UDP packet loss somewhere in
the end-to-end path, not b/w or latency on the CPE link into the
provider's net.

 It actually appears swl.spamhaus.org is the main problem. It doesn't even
 resolve when I try to do it manually. 

From here:

$ host 2.0.0.127.swl.spamhaus.org
2.0.0.127.swl.spamhaus.org has address 127.0.2.2

What response do you receive?

Due to your query volume you require paid service for Spamhaus Zen.  The
same terms apply to all Spamhaus services.  Your IPs may have been
blacklisted from DWL due to high query volume.  Contact Spamhaus.  If
your contract entitles you to all Spamhaus lists, the fix may be as
simple as changing the SWL hostname and adding your key.

 This was a recommendation I used from
 this list some time ago. Has something changed? 

See above.

 postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3
 b.barracudacentral.org*2
 bl.spameatingmonkey.net*2
 bl.spamcop.net
 dnsbl.sorbs.net
 psbl.surriel.com
 bl.mailspike.net

With these 7 dnsbls you will have extreme overlap of listed IPs.  The
last 5 will gain you little to nothing and simply add latency to your
mail transactions, which is something you do not want in a high volume
environment.  I'd recommend you use Zen and BRBL, remove the rest, and
rely on SWL and dnswl for FP mitigation during SMTP.  You also run
SpamAssassin on all of these hosts, so there's no need to pile on dnsbl
queries at SMTP connect.

 swl.spamhaus.org*-4
 list.dnswl.org=127.[0..255].[0..255].0*-2
 list.dnswl.org=127.[0..255].[0..255].1*-3
 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4

Consolidate these last 3 to something like:
list.dnswl.org=127.0.[2..14].[2..3]*-4

To understand why, read Return Codes at:
http://dnswl.org/tech

 I'm also curious what resolvers people are using for their mail servers?
 bind? Looking at my query graphs, it appears to be about 30 queries/sec on
 average for each host, just as a local caching server.

That's ~2.6M queries/day/host.  Eliminating the 5 unnecessary dnsbl
queries will lower this considerably.  If you're not happy with bind,
check out:  http://doc.powerdns.com/html/built-in-recursor.html

If you have more than a handful of hosts doing 2.5M queries/day, you
should seriously consider building a couple of resolvers homed in
different networks and having the MX hosts query the pair.  This will
cut down considerably on the query load you're placing on your dns[b|w]l
servers, as resolver cache will be much more effective.

Cheers,

Stan