Re: Understanding postscreen timeouts
On 05/02/2014 03:15 AM, Alex wrote: Hi, On Thu, May 1, 2014 at 5:38 PM, Wietse Venema wie...@porcupine.org mailto:wie...@porcupine.org wrote: Alex: I'm using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally receiving the following timeout message: May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply timeout 10s for swl.spamhaus.org http://swl.spamhaus.org This time limit has unfortunately escaped my attention. It is not yet configurable. The warning message means that postscreen gives up waiting for the DNS lookup result. This is a safety mechanism. I'm also using a half-dozen RBLs, but they don't all always timeout. I see occasional timeouts on residential and co-located servers. By default the resolver *system library* routines wait 5s before retrying; this may be configurable in resolv.conf, but the postscreen time limit is still hard-coded. These are both corporate 10mbs dedicated links and I don't think latency and/or bandwidth is a problem. It actually appears swl.spamhaus.org http://swl.spamhaus.org is the main problem. It doesn't even resolve when I try to do it manually. This was a recommendation I used from this list some time ago. Has something changed? As a feed user of spamhaus, it's easy to see the amount of data that is actually in the zones. Both DWL and SWL zones are empty, so the whitelist experiments of spamhaus seem to be either 'on hold' or dead. Feel free to drop the zones from your setup. This won't fix dns lookup problems in general though. Tom signature.asc Description: OpenPGP digital signature
Re: Understanding postscreen timeouts
Stan Hoeppner: swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4 Consolidate these last 3 to something like: list.dnswl.org=127.0.[2..14].[2..3]*-4 These three will result in one list.dnswl.org query, just like the consolidated one. There is no performance difference. However, there is a correctness difference. The consolidated form has the same weight 4 for all results, while the original form has different weights. Wietse
postscreen_dnsbl_timeout parameter (was: Understanding postscreen timeouts)
Wietse Venema: Alex: I'm using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally receiving the following timeout message: May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply timeout 10s for swl.spamhaus.org This time limit has unfortunately escaped my attention. It is not yet configurable. Fixed in Postfix 2.12. Wietse 20140501 Cleanup: postcreen_dnsbl_timeout parameter. Files: mantools/postlink, proto/postconf.proto, global/mail_params.h, postscreen/postscreen.c, postscreen/postscreen_dnsbl.c.
Re: Understanding postscreen timeouts
On 5/2/2014 6:07 AM, Wietse Venema wrote: Stan Hoeppner: swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4 Consolidate these last 3 to something like: list.dnswl.org=127.0.[2..14].[2..3]*-4 These three will result in one list.dnswl.org query, just like the consolidated one. There is no performance difference. Correct. The reason for consolidating these is not to reduce queries. However, there is a correctness difference. The consolidated form has the same weight 4 for all results, while the original form has different weights. The consolidated form gives no score to a 4th octet value of [0..1], but gives -4 to [2..3]. This is the key difference. Alex' form and weights are not correct. And that is why I posted the link to the return codes. The second 'octet' is always zero, not a range. The 3rd octet has a range of 2-15, and the 4th octet a range of 0-3. Specifying a range of 0-255 or 2-255 to cover the future may have the opposite effect, resulting in potential disaster, depending on how/if/when dnswl changes things. Such wildcards should not be used. A value of 15 in the 3rd octet means the sender is an Email Marketing Provider. Most people would never whitelist such senders. Alex currently does. Most people would give no preference to a 4th octet score of 0 which means no trust. Alex is giving -2. And he is giving -3 to a 4th octet score of 1, low trust. The recommended scale is -0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl scoring. Using a 4 point scale instead of 100, a 4th octet value of 0 or 1 should be given NO whitelisting preference at all, which is what my consolidated example does. Cheers, Stan
Re: Understanding postscreen timeouts
Hi, On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner s...@hardwarefreak.comwrote: On 5/2/2014 6:07 AM, Wietse Venema wrote: Stan Hoeppner: swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4 Consolidate these last 3 to something like: list.dnswl.org=127.0.[2..14].[2..3]*-4 These three will result in one list.dnswl.org query, just like the consolidated one. There is no performance difference. Correct. The reason for consolidating these is not to reduce queries. However, there is a correctness difference. The consolidated form has the same weight 4 for all results, while the original form has different weights. The consolidated form gives no score to a 4th octet value of [0..1], but gives -4 to [2..3]. This is the key difference. Alex' form and weights are not correct. And that is why I posted the link to the return codes. The second 'octet' is always zero, not a range. The 3rd octet has a range of 2-15, and the 4th octet a range of 0-3. Specifying a range of 0-255 or 2-255 to cover the future may have the opposite effect, resulting in potential disaster, depending on how/if/when dnswl changes things. Such wildcards should not be used. A value of 15 in the 3rd octet means the sender is an Email Marketing Provider. Most people would never whitelist such senders. Alex currently does. Most people would give no preference to a 4th octet score of 0 which means no trust. Alex is giving -2. And he is giving -3 to a 4th octet score of 1, low trust. The recommended scale is -0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl scoring. Using a 4 point scale instead of 100, a 4th octet value of 0 or 1 should be given NO whitelisting preference at all, which is what my consolidated example does. Somehow your first message to the list on this topic didn't make it to me. Had to read it in the archives. Anyway, thanks so much. My postscreen config was generated through a discussion on this list with rob0 some time ago, as well as his postscreen config ( http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's reading, he can correct this. I can't believe I've been whitelisting mass mailers. That's far from what I would want to be doing. In fact, I'm considering figuring out some spamassassin rules to better identify them based on the dnswl queries. Regarding your DNS caching comments, thanks for this too. I hadn't realized there would be bandwidth savings by having one or two DNS servers that are queried on the network versus having a local cache on each mail server. I've always been a bind loyalist, but will consider the powerDNS program if it doesn't improve. I've already made the postscreen changes on the systems, and already noticing fewer DNS queries. I've also removed swl.spamhaus.org entirely, thanks to a conversation with spamhaus and comments from Tom Hendrikx about it being discontinued. Thanks everyone! Alex
Re: Understanding postscreen timeouts
On Fri, May 02, 2014 at 08:10:18PM -0400, Alex wrote: On Fri, May 2, 2014 at 6:45 PM, Stan Hoeppner s...@hardwarefreak.comwrote: On 5/2/2014 6:07 AM, Wietse Venema wrote: Stan Hoeppner: swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4 Consolidate these last 3 to something like: list.dnswl.org=127.0.[2..14].[2..3]*-4 These three will result in one list.dnswl.org query, just like the consolidated one. There is no performance difference. Correct. The reason for consolidating these is not to reduce queries. However, there is a correctness difference. The consolidated form has the same weight 4 for all results, while the original form has different weights. The consolidated form gives no score to a 4th octet value of [0..1], but gives -4 to [2..3]. This is the key difference. Alex' form and weights are not correct. And that is why I posted the link to the return codes. The second 'octet' is always zero, not a range. The 3rd octet has a range of 2-15, and the 4th octet a range of 0-3. Specifying a range of 0-255 or 2-255 to cover the future may have the opposite effect, resulting in potential disaster, depending on how/if/when dnswl changes things. Such wildcards should not be used. Good point. I thought of this, but did not bother to implement it that way. Eventually I will change it. A value of 15 in the 3rd octet means the sender is an Email Marketing Provider. Most people would never whitelist such senders. Alex currently does. Most people would give no preference to a 4th octet score of 0 which means no trust. Well, I whitelist mildly. Do note that this is a whitelist, under management by people who, I suppose, don't like spam any more than you nor I. A DNSWL.org return of 127.0.15.0 means an email marketer who is nominally trying to limit spam (thus deserving a whitelist entry), but who might be doing that well. A -1 score makes sense. It's not enough to override Zen nor a grouping of other DNSBLs, but if that's the only result from postscreen_dnsbl_sites, it's enough to bypass the after-220 checks. Alex is giving -2. And he is giving -3 to a 4th octet score of 1, low trust. The recommended scale is -0.1, -1.0, -10, -100, and this is how SpamAssassin handles dnswl scoring. Yes, I think -1, -2 and -4 make sense. I lump 4th octet 2 and 3 together because I'm a 2. :) Also, a -4 is going to override any borderline DNSBL score. If it doesn't, I expect something to give somewhere. In my studies, I found very little overlap between the DNSBLs and the DNSWLs. Using a 4 point scale instead of 100, a 4th octet value of 0 or 1 should be given NO whitelisting preference at all, which is what my consolidated example does. But I don't agree with that. Scoring at the content scanning stage differs from scoring in postscreen. DNSWL.org assumes that their trust level none sites are not actually making money from spam. I can't speak for Mathias, but I am pretty sure that he would delist ANY known spammer. Somehow your first message to the list on this topic didn't make it to me. Had to read it in the archives. Anyway, thanks so much. My postscreen config was generated through a discussion on this list with rob0 some time ago, as well as his postscreen config ( http://rob0.nodns4.us/howto/postfix/main.cf). Perhaps if he's reading, he can correct this. Hiya! Yes, I remember. BTW, the better link to share is the HTML page, http://rob0.nodns4.us/postscreen.html , which has all the explanations and warnings. I can't believe I've been whitelisting mass mailers. That's far from what I would want to be doing. In fact, I'm considering figuring out some spamassassin rules to better identify them based on the dnswl queries. If you want to be adventurous (and to violate the DNSWL.org spirit) nothing stops you from using 127.0.15.0 with a positive score in postscreen ... or even as a reject_rbl_client in smtpd! I figure these are at worst the gray hats. And why bother giving delays with the after-220 tests they will pass anyway? So yes, my policy here was considered and deliberate. But looking back, I'll agree that a -1 would make more sense than -2. Stan probably tends to be more aggressive than I am. There's no right/wrong to that, it's a choice. Regarding your DNS caching comments, thanks for this too. I hadn't realized there would be bandwidth savings by having one or two DNS servers that are queried on the network versus having a local cache on each mail server. I've always been a bind loyalist, but will consider the powerDNS program if it doesn't improve. I've always been a BIND loyalist too. Now I'm paid to be a BIND loyalist. I have nothing against the competition, certainly I can't say anything bad
Understanding postscreen timeouts
Hi, I'm using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally receiving the following timeout message: May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply timeout 10s for swl.spamhaus.org This appears to happen during periods of load, but also when the server is idle. I understand it's possible to increase the timeout, but I would think 10s would be long enough, so didn't want to start doing that. This is also on multiple hosts on multiple different, unrelated networks. I'm also using a half-dozen RBLs, but they don't all always timeout. I'm using a local bind caching server on the hosts that are involved. Should I consider setting up rbldnsd for this instead? Or is that only for caching local RBLs only? What is the result of this timeout? Does postscreen/dnsblog retry, or is the attempt failed and the mail just passed on? Here is the relevant postscreen info from my config. Please let me know if the full config is necessary. postscreen_access_list = permit_mynetworks, cidr:/etc/postfix/postscreen_access.cidr postscreen_blacklist_action = drop postscreen_dnsbl_action = enforce postscreen_dnsbl_reply_map = pcre:$config_directory/postscreen_dnsbl_reply_map.pcre postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3 b.barracudacentral.org*2 bl.spameatingmonkey.net*2 bl.spamcop.net dnsbl.sorbs.net psbl.surriel.com bl.mailspike.net swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org =127.[0..255].[0..255].[2..255]*-4 postscreen_dnsbl_threshold = 3 postscreen_greet_action = enforce postscreen_whitelist_interfaces = static:all 172.XX.YY.160/32 64.XX.YY.0/24 67.XX.YY.0/24 Thanks so much, Alex
Re: Understanding postscreen timeouts
Alex: I'm using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally receiving the following timeout message: May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply timeout 10s for swl.spamhaus.org This time limit has unfortunately escaped my attention. It is not yet configurable. The warning message means that postscreen gives up waiting for the DNS lookup result. This is a safety mechanism. I'm also using a half-dozen RBLs, but they don't all always timeout. I see occasional timeouts on residential and co-located servers. By default the resolver *system library* routines wait 5s before retrying; this may be configurable in resolv.conf, but the postscreen time limit is still hard-coded. Wietse
Re: Understanding postscreen timeouts
Hi, On Thu, May 1, 2014 at 5:38 PM, Wietse Venema wie...@porcupine.org wrote: Alex: I'm using postfix-2.10.3 with fedora20 and have configured postscreen with spamhaus, barracuda, and a few other DNSBLs. I'm however occasionally receiving the following timeout message: May 1 17:15:01 mail01 postfix/postscreen[4429]: warning: dnsblog reply timeout 10s for swl.spamhaus.org This time limit has unfortunately escaped my attention. It is not yet configurable. The warning message means that postscreen gives up waiting for the DNS lookup result. This is a safety mechanism. I'm also using a half-dozen RBLs, but they don't all always timeout. I see occasional timeouts on residential and co-located servers. By default the resolver *system library* routines wait 5s before retrying; this may be configurable in resolv.conf, but the postscreen time limit is still hard-coded. These are both corporate 10mbs dedicated links and I don't think latency and/or bandwidth is a problem. It actually appears swl.spamhaus.org is the main problem. It doesn't even resolve when I try to do it manually. This was a recommendation I used from this list some time ago. Has something changed? This is my current config: postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3 b.barracudacentral.org*2 bl.spameatingmonkey.net*2 bl.spamcop.net dnsbl.sorbs.net psbl.surriel.com bl.mailspike.net swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4 I'm also curious what resolvers people are using for their mail servers? bind? Looking at my query graphs, it appears to be about 30 queries/sec on average for each host, just as a local caching server. Thanks, Alex
Re: Understanding postscreen timeouts
On 5/1/2014 8:15 PM, Alex wrote: ... These are both corporate 10mbs dedicated links and I don't think latency and/or bandwidth is a problem. The problem, if network related, will be UDP packet loss somewhere in the end-to-end path, not b/w or latency on the CPE link into the provider's net. It actually appears swl.spamhaus.org is the main problem. It doesn't even resolve when I try to do it manually. From here: $ host 2.0.0.127.swl.spamhaus.org 2.0.0.127.swl.spamhaus.org has address 127.0.2.2 What response do you receive? Due to your query volume you require paid service for Spamhaus Zen. The same terms apply to all Spamhaus services. Your IPs may have been blacklisted from DWL due to high query volume. Contact Spamhaus. If your contract entitles you to all Spamhaus lists, the fix may be as simple as changing the SWL hostname and adding your key. This was a recommendation I used from this list some time ago. Has something changed? See above. postscreen_dnsbl_sites = mykey.zen.dq.spamhaus.net*3 b.barracudacentral.org*2 bl.spameatingmonkey.net*2 bl.spamcop.net dnsbl.sorbs.net psbl.surriel.com bl.mailspike.net With these 7 dnsbls you will have extreme overlap of listed IPs. The last 5 will gain you little to nothing and simply add latency to your mail transactions, which is something you do not want in a high volume environment. I'd recommend you use Zen and BRBL, remove the rest, and rely on SWL and dnswl for FP mitigation during SMTP. You also run SpamAssassin on all of these hosts, so there's no need to pile on dnsbl queries at SMTP connect. swl.spamhaus.org*-4 list.dnswl.org=127.[0..255].[0..255].0*-2 list.dnswl.org=127.[0..255].[0..255].1*-3 list.dnswl.org=127.[0..255].[0..255].[2..255]*-4 Consolidate these last 3 to something like: list.dnswl.org=127.0.[2..14].[2..3]*-4 To understand why, read Return Codes at: http://dnswl.org/tech I'm also curious what resolvers people are using for their mail servers? bind? Looking at my query graphs, it appears to be about 30 queries/sec on average for each host, just as a local caching server. That's ~2.6M queries/day/host. Eliminating the 5 unnecessary dnsbl queries will lower this considerably. If you're not happy with bind, check out: http://doc.powerdns.com/html/built-in-recursor.html If you have more than a handful of hosts doing 2.5M queries/day, you should seriously consider building a couple of resolvers homed in different networks and having the MX hosts query the pair. This will cut down considerably on the query load you're placing on your dns[b|w]l servers, as resolver cache will be much more effective. Cheers, Stan