Re: DNS and RBL problems
On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke wrote: > > On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail wrote: > > On 9/14/2018 3:22 PM, Alex wrote: > >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, > >> which is bind configured as a my local caching resolver. > > Sinister issues like this are hard. I'll try and escalate our plans for > > rsync access. > > Alex - have you looked at bad checksum counters on the host? (netstat -s) - > I've seen strange issues before with broken network hardware (or bugs in > switch/router code) caused changes to packets as they passed through the > 'bad' device. The first hints were those counters increasing at the same time > as the mysterious issue happening. I don't see anything relating to bad checksums with netstat :-( I've also tried numerous ethtool config changes. I've also looked through hundreds of packets with tcpdump and wireshark. This isn't a spamassassin message, but does anyone with a postfix system ever see similar "Name service error" messages such as the one below? Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query: lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or domain name not found. Name service error for name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try again It appears to occur quite frequently, and on multiple unrelated systems. I'd love to find out what's causing it. The postfix people ascribed it to a remote server problem, but I can't believe virtually all RBLs, including spamhaus, would have such intermittent problems with *their* name servers.
Re: DNS and RBL problems
On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail wrote: > On 9/14/2018 3:22 PM, Alex wrote: >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, >> which is bind configured as a my local caching resolver. > Sinister issues like this are hard. I'll try and escalate our plans for > rsync access. Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening. -- Daniel J. Luke
Re: DNS and RBL problems
On 9/14/2018 3:22 PM, Alex wrote: > I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, > which is bind configured as a my local caching resolver. Sinister issues like this are hard. I'll try and escalate our plans for rsync access.
Re: DNS and RBL problems
Hi, On Fri, Sep 14, 2018 at 1:51 PM Rob McEwen wrote: > > On 9/14/2018 1:36 PM, Alex wrote: > > Hi, > > > > For the past few weeks I've been having problems with queries to many > > of the common RBLs, including barracuda, mailspike and unsubscore. My > > logs are filled with "Name service error", SERVFAIL and lame-server > > messages for RBLs I know to be valid. > > > > > Alex, > > Coincidentally, a recent new invaluement subscriber was initially having > at least similar problems that didn't make sense. I was stumped. It made > no sense that it wasn't working because everything looked correct. But > then he figured out that the following bug was the cause, and fixing > this bug enabled the queries to start working again: > > NOTICE: SpamAssassin installations affected by a bug, due to a change > Net::DNS made in an earlier version, here is the bug for reference: > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223 > > So you should definitely check to see if this is causing your problem? I should have added that I'm aware of that Net::DNS bug, and I'm using a version long-since fixed. > I will also mention that if you are using a server such as 8.8.8.8, you MUST > change. I found > that if you use 8.8.8.8, you cannot even pass a test for spamassassin builds. > They are doing some > interesting things likely anti-abuse that just screw with things. I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, which is bind configured as a my local caching resolver. It also fails for one out of every thousand queries of the PCCC RBL for no clear reason. 14-Sep-2018 15:16:39.333 query-errors: info: client @0x7ff797169d70 68.195.193.45#34244 (hungryhowies.com.wild.pccc.com): query failed (SERVFAIL) for hungryhowies.com.wild.pccc.com/IN/A at ../../../bin/named/query.c:8580 14-Sep-2018 15:16:39.333 query-errors: debug 2: fetch completed at ../../../lib/dns/resolver.c:3927 for hungryhowies.com.wild.pccc.com/A in 30.000163: timed out/success [domain:wild.pccc.com,referral:0,restart:7,qrysent:7,timeout:6,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] The check for hungryhowies.com succeeded at that time for a dozen other RBLs, but later checks could fail for even one of those.
Re: DNS and RBL problems
I will also mention that if you are using a server such as 8.8.8.8, you MUST change. I found that if you use 8.8.8.8, you cannot even pass a test for spamassassin builds. They are doing some interesting things likely anti-abuse that just screw with things. Regards, KAM -- Kevin A. McGrail VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Fri, Sep 14, 2018 at 1:50 PM, Rob McEwen wrote: > On 9/14/2018 1:36 PM, Alex wrote: > >> Hi, >> >> For the past few weeks I've been having problems with queries to many >> of the common RBLs, including barracuda, mailspike and unsubscore. My >> logs are filled with "Name service error", SERVFAIL and lame-server >> messages for RBLs I know to be valid. >> >> > > > Alex, > > Coincidentally, a recent new invaluement subscriber was initially having > at least similar problems that didn't make sense. I was stumped. It made no > sense that it wasn't working because everything looked correct. But then he > figured out that the following bug was the cause, and fixing this bug > enabled the queries to start working again: > > NOTICE: SpamAssassin installations affected by a bug, due to a change > Net::DNS made in an earlier version, here is the bug for reference: > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223 > > So you should definitely check to see if this is causing your problem? > > -- > Rob McEwen > https://www.invaluement.com > > >
Re: DNS and RBL problems
On 9/14/2018 1:36 PM, Alex wrote: Hi, For the past few weeks I've been having problems with queries to many of the common RBLs, including barracuda, mailspike and unsubscore. My logs are filled with "Name service error", SERVFAIL and lame-server messages for RBLs I know to be valid. Alex, Coincidentally, a recent new invaluement subscriber was initially having at least similar problems that didn't make sense. I was stumped. It made no sense that it wasn't working because everything looked correct. But then he figured out that the following bug was the cause, and fixing this bug enabled the queries to start working again: NOTICE: SpamAssassin installations affected by a bug, due to a change Net::DNS made in an earlier version, here is the bug for reference: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223 So you should definitely check to see if this is causing your problem? -- Rob McEwen https://www.invaluement.com
DNS and RBL problems
Hi, For the past few weeks I've been having problems with queries to many of the common RBLs, including barracuda, mailspike and unsubscore. My logs are filled with "Name service error", SERVFAIL and lame-server messages for RBLs I know to be valid. 14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f105735f3b0 127.0.0.1#44791 (139.33.47.104.bl.mailspike.net): query failed (SERVFAIL) for 139.33.47.104.bl.mailspike.net/IN/A at ../../../bin/named/query.c:8580 14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f10342d4650 127.0.0.1#44791 (139.33.47.104.db.wpbl.info): query failed (SERVFAIL) for 139.33.47.104.db.wpbl.info/IN/A at ../../../bin/named/query.c:8580 14-Sep-2018 12:21:10.928 query-errors: debug 2: fetch completed at ../../../lib/dns/resolver.c:3927 for 139.33.47.104.bl.mailspike.net/A in 30.000146: timed out/success [domain:bl.mailspike.net,referral:0,restart:5,qrysent:14,timeout:13,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] This shows a failure while other times these same queries succeed. This is using bind set up as a standard recursive name server on fedora28. These are bind logs, but does anyone know why spamassassin queries to these RBLs would timeout? There's no firewall involved. It appears to happen at all times during the day. I really have no other ideas after staring at the logs for weeks, seeing it happen on all my systems, and asking on numerous other lists (including postfix and bind-users).