Re: DNS and RBL problems

2018-09-14 Thread Alex
On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke  wrote:
>
> On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail  wrote:
> > On 9/14/2018 3:22 PM, Alex wrote:
> >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
> >> which is bind configured as a my local caching resolver.
> > Sinister issues like this are hard.  I'll try and escalate our plans for
> > rsync access.
>
> Alex - have you looked at bad checksum counters on the host? (netstat -s) - 
> I've seen strange issues before with broken network hardware (or bugs in 
> switch/router code) caused changes to packets as they passed through the 
> 'bad' device. The first hints were those counters increasing at the same time 
> as the mysterious issue happening.

I don't see anything relating to bad checksums with netstat :-( I've
also tried numerous ethtool config changes. I've also looked through
hundreds of packets with tcpdump and wireshark.

This isn't a spamassassin message, but does anyone with a postfix
system ever see similar "Name service error" messages such as the one
below?

Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query:
lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or
domain name not found. Name service error for
name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try
again

It appears to occur quite frequently, and on multiple unrelated
systems. I'd love to find out what's causing it. The postfix people
ascribed it to a remote server problem, but I can't believe virtually
all RBLs, including spamhaus, would have such intermittent problems
with *their* name servers.


Re: DNS and RBL problems

2018-09-14 Thread Daniel J. Luke
On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail  wrote:
> On 9/14/2018 3:22 PM, Alex wrote:
>> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
>> which is bind configured as a my local caching resolver.
> Sinister issues like this are hard.  I'll try and escalate our plans for
> rsync access.

Alex - have you looked at bad checksum counters on the host? (netstat -s) - 
I've seen strange issues before with broken network hardware (or bugs in 
switch/router code) caused changes to packets as they passed through the 'bad' 
device. The first hints were those counters increasing at the same time as the 
mysterious issue happening.

-- 
Daniel J. Luke





Re: DNS and RBL problems

2018-09-14 Thread Kevin A. McGrail
On 9/14/2018 3:22 PM, Alex wrote:
> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
> which is bind configured as a my local caching resolver.
Sinister issues like this are hard.  I'll try and escalate our plans for
rsync access.


Re: DNS and RBL problems

2018-09-14 Thread Alex
Hi,

On Fri, Sep 14, 2018 at 1:51 PM Rob McEwen  wrote:
>
> On 9/14/2018 1:36 PM, Alex wrote:
> > Hi,
> >
> > For the past few weeks I've been having problems with queries to many
> > of the common RBLs, including barracuda, mailspike and unsubscore. My
> > logs are filled with "Name service error", SERVFAIL and lame-server
> > messages for RBLs I know to be valid.
> > 
>
>
> Alex,
>
> Coincidentally, a recent new invaluement subscriber was initially having
> at least similar problems that didn't make sense. I was stumped. It made
> no sense that it wasn't working because everything looked correct. But
> then he figured out that the following bug was the cause, and fixing
> this bug enabled the queries to start working again:
>
> NOTICE: SpamAssassin installations affected by a bug, due to a change
> Net::DNS made in an earlier version, here is the bug for reference:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223
>
> So you should definitely check to see if this is causing your problem?

I should have added that I'm aware of that Net::DNS bug, and I'm using
a version long-since fixed.

> I will also mention that if you are using a server such as 8.8.8.8, you MUST 
> change.  I found
> that if you use 8.8.8.8, you cannot even pass a test for spamassassin builds. 
>  They are doing some
> interesting things likely anti-abuse that just screw with things.

I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
which is bind configured as a my local caching resolver.

It also fails for one out of every thousand queries of the PCCC RBL
for no clear reason.

14-Sep-2018 15:16:39.333 query-errors: info: client @0x7ff797169d70
68.195.193.45#34244 (hungryhowies.com.wild.pccc.com): query failed
(SERVFAIL) for hungryhowies.com.wild.pccc.com/IN/A at
../../../bin/named/query.c:8580

14-Sep-2018 15:16:39.333 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for hungryhowies.com.wild.pccc.com/A
in 30.000163: timed out/success
[domain:wild.pccc.com,referral:0,restart:7,qrysent:7,timeout:6,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

The check for hungryhowies.com succeeded at that time for a dozen
other RBLs, but later checks could fail for even one of those.


Re: DNS and RBL problems

2018-09-14 Thread Kevin A. McGrail
I will also mention that if you are using a server such as 8.8.8.8, you
MUST change.  I found that if you use 8.8.8.8, you cannot even pass a test
for spamassassin builds.  They are doing some interesting things likely
anti-abuse that just screw with things.

Regards,
KAM

--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Fri, Sep 14, 2018 at 1:50 PM, Rob McEwen  wrote:

> On 9/14/2018 1:36 PM, Alex wrote:
>
>> Hi,
>>
>> For the past few weeks I've been having problems with queries to many
>> of the common RBLs, including barracuda, mailspike and unsubscore. My
>> logs are filled with "Name service error", SERVFAIL and lame-server
>> messages for RBLs I know to be valid.
>> 
>>
>
>
> Alex,
>
> Coincidentally, a recent new invaluement subscriber was initially having
> at least similar problems that didn't make sense. I was stumped. It made no
> sense that it wasn't working because everything looked correct. But then he
> figured out that the following bug was the cause, and fixing this bug
> enabled the queries to start working again:
>
> NOTICE: SpamAssassin installations affected by a bug, due to a change
> Net::DNS made in an earlier version, here is the bug for reference:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223
>
> So you should definitely check to see if this is causing your problem?
>
> --
> Rob McEwen
> https://www.invaluement.com
>
>
>


Re: DNS and RBL problems

2018-09-14 Thread Rob McEwen

On 9/14/2018 1:36 PM, Alex wrote:

Hi,

For the past few weeks I've been having problems with queries to many
of the common RBLs, including barracuda, mailspike and unsubscore. My
logs are filled with "Name service error", SERVFAIL and lame-server
messages for RBLs I know to be valid.




Alex,

Coincidentally, a recent new invaluement subscriber was initially having 
at least similar problems that didn't make sense. I was stumped. It made 
no sense that it wasn't working because everything looked correct. But 
then he figured out that the following bug was the cause, and fixing 
this bug enabled the queries to start working again:


NOTICE: SpamAssassin installations affected by a bug, due to a change 
Net::DNS made in an earlier version, here is the bug for reference:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223

So you should definitely check to see if this is causing your problem?

--
Rob McEwen
https://www.invaluement.com




DNS and RBL problems

2018-09-14 Thread Alex
Hi,

For the past few weeks I've been having problems with queries to many
of the common RBLs, including barracuda, mailspike and unsubscore. My
logs are filled with "Name service error", SERVFAIL and lame-server
messages for RBLs I know to be valid.

14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f105735f3b0
127.0.0.1#44791 (139.33.47.104.bl.mailspike.net): query failed
(SERVFAIL) for 139.33.47.104.bl.mailspike.net/IN/A at
../../../bin/named/query.c:8580
14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f10342d4650
127.0.0.1#44791 (139.33.47.104.db.wpbl.info): query failed (SERVFAIL)
for 139.33.47.104.db.wpbl.info/IN/A at ../../../bin/named/query.c:8580
14-Sep-2018 12:21:10.928 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for 139.33.47.104.bl.mailspike.net/A
in 30.000146: timed out/success
[domain:bl.mailspike.net,referral:0,restart:5,qrysent:14,timeout:13,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

This shows a failure while other times these same queries succeed.

This is using bind set up as a standard recursive name server on
fedora28. These are bind logs, but does anyone know why spamassassin
queries to these RBLs would timeout? There's no firewall involved. It
appears to happen at all times during the day.

I really have no other ideas after staring at the logs for weeks,
seeing it happen on all my systems, and asking on numerous other lists
(including postfix and bind-users).