Re: Failure to check FCrDNS with long DNS replies?

2022-10-20 Thread Joerg Jung

> On 18. Oct 2022, at 16:41, Tassilo Philipp  
> wrote:
> 
>>> On 21. Nov 2020, at 10:44, Tassilo Philipp  
>>> wrote:
>>> 
 FYI, I run into the same issue with a different provider:
 relay.yourmailgateway.de which also has a large number of A records.
 
 Trying to reproduce and digging deeper now, by adding debug logs etc.
>>> 
>>> Interesting... thanks for checking and having thought of my report. I for 
>>> myself didn't have any issues anymore, however, I barely ever receive any 
>>> mail from sfr. Also, given the random order of IPs in the DNS reply, I 
>>> simply might have had luck if it's in any case related to the IP order. I 
>>> have no evidence for, but when I was having problems, the IP in question 
>>> was among the last ones in the reply.
>>> 
>>> I'm curious what you'll find…
>> 
>> FYI, after digging deeper into this, I figured out that this was an issue 
>> with the DNS forwarders/resolver I was using (unfortunately not under my 
>> control) on this particular mail server: The forwarder is not able to 
>> resolve relay.yourmailgateway.de 
>> > >
>> at all, likely due to the large number of A records 52 A + 13  records.
>> 
>> I believe there is a limit in BIND suite (32) and OpenBSD libc (35) and 
>> others, which restricts older gethostbyname() calls with struct hostent 
>> results down to that 30-something number. Likely the used resolver was using 
>> these old/obsolete libc functions…
>> 
>> But OpenSMTPD and filter FCrDNS and OpenBSD ASR all doing fine here, because 
>> using getaddrinfo() alike under the hood with dynamic struct addrinfo result 
>> allocation, which does not expose any such limits and resolves all 65 A and 
>>  records just fine.
> 
> Thanks for the feedback, that sounds like a fitting analysis. So if I follow 
> your thought, the resolver basically truncates the list and what opensmtpd 
> gets to see at the hand sometimes misses the entry it tries to verify? Sounds 
> like the culprit indeed.

In theory: yes.

But in my case the forwarder not just truncated it, but instead it failed 
completely to 
resolve anything. Recursive resolvers can always temporarily fail, i.e. no 
suitable 
BGP route to specific  name server not available, etc.

However, OpenSMTPD should catch such resolving errors and skip(?) FCrDNS in 
smtp_session.c in smtpd_getnameinfo_cb() and even log a warning about what
went wrong and setting fcrdns = -1.

I maybe mistaken, but if understand the logic in lka_filter.c in 
filter_check_fcrdns() 
correctly it "silently drops” the error case (fcrdns = -1) with the conversion 
to boolean
of the “ret” value and just fails instead of skipping them. Although, I might 
missed
something else here. Looks like a very minor bug to me, but I’ll try to verify 
this and
come up with a diff.


> I personally did not observe this issue anymore, unsure why, some update 
> might have fixed it on some upstream resolver or dunno...
> How are you dealing with this, given you don't control the resolver? I guess 
> you just switched it?

For now, I’m still playing around with this as I have a reproducible case :)
Obviously, yes I could switch to another resolver.
I may also be able to tweak smtpd.conf and skip fcrdns for specific domains, 
i.e. whitelisting them.



Re: Failure to check FCrDNS with long DNS replies?

2022-10-18 Thread Tassilo Philipp

On 21. Nov 2020, at 10:44, Tassilo Philipp  wrote:


FYI, I run into the same issue with a different provider:
relay.yourmailgateway.de which also has a large number of A records.

Trying to reproduce and digging deeper now, by adding debug logs etc.


Interesting... thanks for checking and having thought of my report. I 
for myself didn't have any issues anymore, however, I barely ever 
receive any mail from sfr. Also, given the random order of IPs in the 
DNS reply, I simply might have had luck if it's in any case related 
to the IP order. I have no evidence for, but when I was having 
problems, the IP in question was among the last ones in the reply.


I'm curious what you'll find…


FYI, after digging deeper into this, I figured out that this was an issue 
with the DNS forwarders/resolver I was using (unfortunately not under 
my control) on this particular mail server: The forwarder is not able 
to resolve relay.yourmailgateway.de 

at all, likely due to the large number of A records 52 A + 13  records.

I believe there is a limit in BIND suite (32) and OpenBSD libc (35) 
and others, which restricts older gethostbyname() calls with struct 
hostent results down to that 30-something number. Likely the used 
resolver was using these old/obsolete libc functions…


But OpenSMTPD and filter FCrDNS and OpenBSD ASR all doing fine here, 
because using getaddrinfo() alike under the hood with dynamic struct 
addrinfo result allocation, which does not expose any such limits and 
resolves all 65 A and  records just fine.


Thanks for the feedback, that sounds like a fitting analysis. So if I 
follow your thought, the resolver basically truncates the list and what 
opensmtpd gets to see at the hand sometimes misses the entry it tries to 
verify? Sounds like the culprit indeed.


I personally did not observe this issue anymore, unsure why, some update 
might have fixed it on some upstream resolver or dunno...
How are you dealing with this, given you don't control the resolver? I 
guess you just switched it?


Thanks again for digging into this more



Re: Failure to check FCrDNS with long DNS replies?

2022-10-18 Thread Joerg Jung

> On 21. Nov 2020, at 10:44, Tassilo Philipp  
> wrote:
> 
>> FYI, I run into the same issue with a different provider:
>> relay.yourmailgateway.de which also has a large number of A records.
>> 
>> Trying to reproduce and digging deeper now, by adding debug logs etc.
> 
> Interesting... thanks for checking and having thought of my report. I for 
> myself didn't have any issues anymore, however, I barely ever receive any 
> mail from sfr. Also, given the random order of IPs in the DNS reply, I simply 
> might have had luck if it's in any case related to the IP order. I have no 
> evidence for, but when I was having problems, the IP in question was among 
> the last ones in the reply.
> 
> I'm curious what you'll find…

FYI, after digging deeper into this, I figured out that this was an issue 
with the DNS forwarders/resolver I was using (unfortunately not under my 
control)
on this particular mail server: The forwarder is not able to resolve 
relay.yourmailgateway.de  
at all, likely due to the large number of A records 52 A + 13  records. 

I believe there is a limit in BIND suite (32) and OpenBSD libc (35) and others, 
which restricts older gethostbyname() calls with struct hostent results 
down to that 30-something number. Likely the used resolver was using 
these old/obsolete libc functions…

But OpenSMTPD and filter FCrDNS and OpenBSD ASR all doing fine here, 
because using getaddrinfo() alike under the hood with dynamic struct addrinfo 
result allocation, which does not expose any such limits and resolves 
all 65 A and  records just fine.

Thanks,
Regards,
Joerg





Re: Failure to check FCrDNS with long DNS replies?

2020-11-21 Thread Tassilo Philipp

FYI, I run into the same issue with a different provider:
relay.yourmailgateway.de which also has a large number of A records.

Trying to reproduce and digging deeper now, by adding debug logs etc.


Interesting... thanks for checking and having thought of my report. I 
for myself didn't have any issues anymore, however, I barely ever 
receive any mail from sfr. Also, given the random order of IPs in the 
DNS reply, I simply might have had luck if it's in any case related to 
the IP order. I have no evidence for, but when I was having problems, 
the IP in question was among the last ones in the reply.


I'm curious what you'll find...

Thanks!



Re: Failure to check FCrDNS with long DNS replies?

2020-11-20 Thread Joerg Jung
On Mon, Aug 03, 2020 at 02:05:20PM +0200, Tassilo Philipp wrote:
> > Mhmm… but they returned different results, for dig vs OpenSMTPd filter
> > lookup?
> 
> Not sure, as I don't log the replies, but I don't think so.
> 
> 
> > May cache TTL have expired and record re-fetched with your local test?
> > What’s your local cache software, is it able to handle large A record
> > lists?
> 
> It's "unbound", so yes, it should handle that just fine. The result I pasted
> was also queried through it.
> 
> 
> > In regards to the dnscrypt servers, are you sure you hit the same
> > recursive resolver with dig as with OpenSMTPd filter before?
> 
> Absolutely, I enforced a single one for a test, namely soltysiak.
> 
> 
> > If you can reproduce, this would indeed point to an issue in the filter
> > or local cache.  But that case should be easy test by just sending some
> > test-mails from a sfr.fr  account?
> 
> Correct. Unfortunately I don't know anyone with such an account, and wanted
> to setup just a local test, faking it, to at least be able to exclude
> opensmtp as a culprit.
> 
> In the end I just disabled the check, as one email user was desperately
> waiting for a mail that was affected, and I saw it in the logs being
> rejected over and over again, and I hade some others spinning plates to
> handle at that time. Now I have a bit more time and headspace again and can
> look into it more.

FYI, I run into the same issue with a different provider:
relay.yourmailgateway.de which also has a large number of A records.

Trying to reproduce and digging deeper now, by adding debug logs etc.

> > Maybe someone subscribed to this list has such an account and could send
> > you a test mail?
> 
> That would be terrific!



Re: Failure to check FCrDNS with long DNS replies?

2020-08-03 Thread Tassilo Philipp
Mhmm… but they returned different results, for dig vs OpenSMTPd filter lookup? 


Not sure, as I don't log the replies, but I don't think so.


May cache TTL have expired and record re-fetched with your local test? 
What’s your local cache software, is it able to handle large A record lists? 


It's "unbound", so yes, it should handle that just fine. The result I 
pasted was also queried through it.



In regards to the dnscrypt servers, are you sure you hit the same 
recursive resolver with dig as with OpenSMTPd filter before?


Absolutely, I enforced a single one for a test, namely soltysiak.


If you can reproduce, this would indeed point to an issue in the 
filter or local cache.  But that case should be easy test by just 
sending some test-mails from a sfr.fr  account? 


Correct. Unfortunately I don't know anyone with such an account, and 
wanted to setup just a local test, faking it, to at least be able to 
exclude opensmtp as a culprit.


In the end I just disabled the check, as one email user was desperately 
waiting for a mail that was affected, and I saw it in the logs being 
rejected over and over again, and I hade some others spinning plates to 
handle at that time. Now I have a bit more time and headspace again and 
can look into it more.



Maybe someone subscribed to this list has such an account and could send 
you a test mail?


That would be terrific!




Re: Failure to check FCrDNS with long DNS replies?

2020-08-03 Thread Joerg Jung


> On 3. Aug 2020, at 12:23, Tassilo Philipp  wrote:
> 
> Thanks for the reply and your thoughts.
> 
>> There should be nothing limit FCrDNS here, despite that
>> these are a lot of records.
>> 
>> But have you tried the dig lookup below from the actual mail
>> server at the time (or shortly after) the time of the failure?
> 
> Yes, that was the first thing I tried, and I had those delivery failures 
> before and after that test. (In fact, I changed the error message to one 
> specific to the fcrdns check, restarted opensmtp and waited for the next 
> delivery attempt).
> 
> After that I started looking into the sources of OpenSMTPd and all I found 
> was a loop running over all records in the reply, so yeah, no limitation 
> there.
> 
> 
>> While the DNS record seems to be there and correct:
>> At the time of the connect your mail server was not be able to resolve the 
>> record through whatever you have configured as forwarder/lookup/recursive 
>> DNS servers.
>> Reasons can vary from local provider network hiccup to
>> global BGP issues.
>> Your mail server may use a different route and different
>> lookup servers than your local client you test dig command with.
> 
> It's a local DNS cache which forwards to some dnscrypt servers. I verified 
> from the logs that my manual name resolution test I did, and the lookup from 
> OpenSMTPd did use the same resolution.


Mhmm… but they returned different results, for dig vs OpenSMTPd filter lookup?
May cache TTL have expired and record re-fetched with your local test? 
What’s your local cache software, is it able to handle large A record lists?
In regards to the dnscrypt servers, are you sure you hit the same recursive 
resolver 
with dig as with OpenSMTPd filter before?

If you can reproduce, this would indeed point to an issue in the filter or 
local cache.
But that case should be easy test by just sending some test-mails from 
a sfr.fr  account?
Maybe someone subscribed to this list has such an account and could send
you a test mail?

Re: Failure to check FCrDNS with long DNS replies?

2020-08-03 Thread Tassilo Philipp

Thanks for the reply and your thoughts.


There should be nothing limit FCrDNS here, despite that
these are a lot of records.

But have you tried the dig lookup below from the actual mail
server at the time (or shortly after) the time of the failure?


Yes, that was the first thing I tried, and I had those delivery failures 
before and after that test. (In fact, I changed the error message to one 
specific to the fcrdns check, restarted opensmtp and waited for the next 
delivery attempt).


After that I started looking into the sources of OpenSMTPd and all I 
found was a loop running over all records in the reply, so yeah, no 
limitation there.




While the DNS record seems to be there and correct:
At the time of the connect your mail server was not be able to 
resolve the record through whatever you have configured as 
forwarder/lookup/recursive DNS servers.

Reasons can vary from local provider network hiccup to
global BGP issues.
Your mail server may use a different route and different
lookup servers than your local client you test dig command with.


It's a local DNS cache which forwards to some dnscrypt servers. I 
verified from the logs that my manual name resolution test I did, and 
the lookup from OpenSMTPd did use the same resolution.




I have often seen local ISP forwarding DNS servers being
blocked by other large ISP DNS servers already,
e.g. Hetzner DNS recursive Forwarders (and even
whole Hetzner netblocks) are blocked by Telekom
authoritative DNS servers, due to abuse reasons. 


Yeah, I hear you, I had similar experiences in the past, one also with 
Telekom btw., in our case it was for an online game, and we ultimately 
needed to tell players that were affected to "tunnel around" some hop in 
Frankfurt, as it was impossible to get in contact with anyone at 
Telekom that was able or willing to get us in contact with the 
technicians, there. After like 2 months that route was fine again.




I also have similar experiences the other way around with
other ISPs blocking Telekom forwarders, etc.

Sometimes you may be able to contact abuse/tech addresses
to get the relevant IPs unblocked, but often this is just
temporary anyways.

Not everything is reachable from everywhere as it should be.
This happens all the time. This is the Internet.


This is very true, this is the internet.

While looking into this I was just really surprised to see the long list 
of A records this resolves to and it felt like this was maybe the 
culprit... everything else worked fine and no other mail server 
connecting was rejected that way.


I'll reenable the fcrdns check again, and see what happens. It was 
disabled now for a while b/c of a user depending on some mails coming 
from SFR.


Thanks for the feedback!




Re: Failure to check FCrDNS with long DNS replies?

2020-08-02 Thread Joerg Jung


> On 21. Jul 2020, at 12:46, Tassilo Philipp  
> wrote:
> 
> Hello,
> 
> I have a strange problem, emails coming from a specific SMTP from SFR, namely 
> smtp26.services.sfr.fr get incorrectly filtered by a fcrdns check. The filter 
> line in question is:
> 
> filter check_fcrdns phase connect match !fcrdns disconnect "550 incorrect or 
> no PTR record for submitting host (no FC)"
> 
> 
> I made sure it's exactly that check, and not any other that is triggered, by 
> using a specific error message.
> 
> The connecting server in my case is 93.17.128.197, which points back to 
> smtp26.services.sfr.fr:
> 
> --%<---
> $ dig -x 93.17.128.197
> 
> ; <<>> DiG 9.16.2 <<>> -x 93.17.128.197
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34923
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ;; QUESTION SECTION:
> ;197.128.17.93.in-addr.arpa.IN  PTR
> 
> ;; ANSWER SECTION:
> 197.128.17.93.in-addr.arpa. 83568 INPTR smtp26.services.sfr.fr.
> 
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.10#53(127.0.0.10)
> ;; WHEN: Tue Jul 21 12:26:47 CEST 2020
> ;; MSG SIZE  rcvd: 91
> --->%--
> 
> 
> And this is what smtp26.services.sfr.fr resolves to:
> 
> --%<---
> $ dig smtp26.services.sfr.fr
> 
> ; <<>> DiG 9.16.2 <<>> smtp26.services.sfr.fr
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4219
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 42, AUTHORITY: 0, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ;; QUESTION SECTION:
> ;smtp26.services.sfr.fr.IN  A
> 
> ;; ANSWER SECTION:
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.198
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.199
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.200
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.201
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.202
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.203
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.204
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.205
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.207
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.208
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.209
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.210
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.211
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.212
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.213
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.206
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.214
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.215
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.216
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.217
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.218
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.3
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.163
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.20
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.10
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.1
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.11
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.12
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.13
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.2
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.4
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.21
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.22
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.189
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.190
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.191
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.192
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.193
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.194
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.195
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.196
> smtp26.services.sfr.fr. 3084IN  A   93.17.128.197
> 
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.10#53(127.0.0.10)
> ;; WHEN: Tue Jul 21 12:26:50 CEST 2020
> ;; MSG SIZE  rcvd: 723
> --->%--
> 
> 
> So, from my understanding, their DNS setup is correct, and the check 
> shouldn't fail. The only thing that looks suspicious to me is that 
> smtp26.services.sfr.fr points to a lot of records. Their order is seemingly 
> random when fetching those records, I haven't tested, yet, whether it works 
> if