Re: [exim] Google SMTP Timeouts on large mails
On 2022-04-29, Graeme Coates via Exim-users wrote: > Hi all, > > > > I've seen this issue raised in: > > > > https://lists.exim.org/lurker/message/20220216.071725.892984cd.en.html > > and > > https://lists.exim.org/lurker/message/20220313.200645.624cc373.en.html > > > > but haven't seen a definite resolution as yet. > > > > As per other reports, I have a Debian Bullseye (11.3) system running Exim > 4.94.2 #2. It is setup with virtual domains using dovecot for local delivery > and aliases defined for some simple forwarding. I wasn't aware of any > similar issue in Exim 4.92 (on Debian 10). I see log reports similar to > other reports - eg: > > > > /var/log/exim4/mainlog:2022-04-27 07:47:30 1njbGQ-005LxL-M5 > H=gmail-smtp-in.l.google.com [2a00:1450:4010:c0e::1a]: SMTP timeout after > sending data block (199774 bytes written): Connection timed out > > /var/log/exim4/mainlog:2022-04-27 07:50:10 1njbGU-005Lz8-RV > H=gmail-smtp-in.l.google.com [74.125.131.26]: SMTP timeout after end of data > (246239 bytes written): Connection timed out > > > > This is for both ipv4 and ipv6 connections, and to only Google mail servers, > and only when delivering "large" messages (that are bigger than say about > 100kb, though I haven't investigated fully the limits - short, text only is > fine). Eventually, the messages do get through, but with delays of hours in > some cases. As per other reports, delivery of the same mail to all other > hosts works perfectly. This occurs both with firewall rules set to allow > everything, as well as with a "normal" ruleset allowing: all > OUTBOUND/FORWARD, all icmp INBOUND and all TCP INBOUND with ctstate > RELATED,ESTABLISHED (as well as ports opened for relevant services). > > > > If I do: sysctl net.ipv4.tcp_window_scaling=0 , then everything works > perfectly - with tcp_window_scaling=1, the issue is reproduced. > > > > I have a packet capture which is available here: > > > > https://tinyurl.com/742s855d > > > > The Session log from Exim in debug mode is here (with redacted hosts, > addresses, etc) - the message was delivered to the server, and is being > forwarded onto an email in a Google workspace account (following a > forwarding rule in an aliases file) > > > > https://tinyurl.com/22nn887u > > > > > > Is it possible from these traces to pin down the issue at all and maybe come > up with a workround (without having to turn off tcp_window_scaling) or a > pointer as to where I need to formally raise a bug, and I'll be happy to do > so! make sure that your DNS and return-path MX are working, we recently had some sort of firewall issue that was unrelated to SMTP causing timeouts on deliveries to gmail. removing the firewall rules cleared it up. -- Jasen. -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Google SMTP Timeouts on large mails
On 30/04/2022 17:43, Adam D. Barratt via Exim-users wrote: This is likely to be the result of a known issue with Google's TCP Fast Open setup - see e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/ Always worth a try, but that blog description doesn't match what the packet capture showed. -- Cheers, Jeremy -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Google SMTP Timeouts on large mails
On Fri, 2022-04-29 at 10:56 +0100, Graeme Coates via Exim-users wrote: > Hi all, > > > > I've seen this issue raised in: > > > > https://lists.exim.org/lurker/message/20220216.071725.892984cd.en.html > > and > > https://lists.exim.org/lurker/message/20220313.200645.624cc373.en.html > > > > but haven't seen a definite resolution as yet. > > > > As per other reports, I have a Debian Bullseye (11.3) system running > This is likely to be the result of a known issue with Google's TCP Fast Open setup - see e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/ Exim 4.93 changed the default for the "hosts_try_fastopen" transport option to be "*", and the default for the net.ipv4.tcp_fastopen_blackhole_timeout_sec sysctl changed from 3600 (i.e. an hour) to 0 at some point between the kernel versions in Debian buster (10) and bullseye (11). A workaround is to add something similar to "hosts_try_fastopen = ! *.l.google.com" to your SMTP transports. Regards, Adam -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Google SMTP Timeouts on large mails
On 29/04/2022 10:56, Graeme Coates via Exim-users wrote: a pointer as to where I need to formally raise a bug, and I'll be happy to do so! I forgot to answer this point. You could open one at bugs.exim.org just so the info doesn't get lost. But, currently, I don't think it's likely a bug in Exim. You should, I think, open a bug against Debian. Include that packet capture; it's a red flag. Feel free to include my analysis of it, too. (I do hope you're not running any bolt-on "security" products. I've seen too many bugs associated with such.) -- Cheers, Jeremy -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Google SMTP Timeouts on large mails
On 29/04/2022 10:56, Graeme Coates via Exim-users wrote: I have a packet capture which is available here: https://tinyurl.com/742s855d Thank you so much for gathering this. It seems to show buggy behaviour in your Debian TCP implementation; (or possibly software-firewall) I don't see any way that Exim could be forcing this. Specifically, we see (multiple) retries of a TCP segment for which we saw both the original data and the ACK from the peer (Google). There are no SACKs, despite further ACKs after the apparently missed one (and it being a SACK-enabled connection). This implies *no* ACKs from that point on were received by the TCP code. We can't tell exactly what data was involved, lacking the TLS session keys, but given the above it's probably moot. If you care to investigate that, see the text around "Add SSLKEYLOGFILE to keep_environment in the exim config" and feed the resulting file to wireshark. The Session log from Exim in debug mode is here (with redacted hosts, addresses, etc) - the message was delivered to the server, and is being forwarded onto an email in a Google workspace account (following a forwarding rule in an aliases file) https://tinyurl.com/22nn887u It all looks reasonable there, up to the point that the GnuTLS library tells us "The TLS connection was non-properly terminated." - which would follow on from the pcap-observed problem at the TCP level. Is it possible from these traces to pin down the issue at all and maybe come up with a workround (without having to turn off tcp_window_scaling) or a pointer as to where I need to formally raise a bug, and I'll be happy to do so! You already mentioned IPv4/6 makes no difference. You could try disabling TFO (but I think it's unlikely to help), TLSv1.3 (ditto), CHNNKING (more possible, but again it's entirely the wrong protocol layer), PIPELINING (ditto). The problem going away when you disable TCP window scaling is interesting, but it might just be shifting the point it bites to somewhere else in other size flows. Exim has no facilities to set a small transmit socket buffer size (which would have the same effect, and not massacre your performance on other networking users), I'm afraid. I guess, if ACKs are not being seen by your TCP endpoint, the socket will still be holding un-ack'd data in the transmit queue. If you can catch that (use "ss -panmit dport = 25") it would confirm my interpretation. If it's the firewall that's dropping inbound TCP ACK packets, I guess there's the possibility of configuring it to log drops. -- Cheers, Jeremy -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Taint checking and exim 4.96rc0
On 29/04/2022 20:07, Heiko Schlittermann via Exim-users wrote: Do we have *new* taintchecks that break configurations that were considered secure with 4.95? I has a hash_32_64 of data, accepted in 4.95, requires quote_pgsql with 4.96. Does a hash pass a taint? Whatever, easily adjusted in my config. -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
[exim] Google SMTP Timeouts on large mails
Hi all, I've seen this issue raised in: https://lists.exim.org/lurker/message/20220216.071725.892984cd.en.html and https://lists.exim.org/lurker/message/20220313.200645.624cc373.en.html but haven't seen a definite resolution as yet. As per other reports, I have a Debian Bullseye (11.3) system running Exim 4.94.2 #2. It is setup with virtual domains using dovecot for local delivery and aliases defined for some simple forwarding. I wasn't aware of any similar issue in Exim 4.92 (on Debian 10). I see log reports similar to other reports - eg: /var/log/exim4/mainlog:2022-04-27 07:47:30 1njbGQ-005LxL-M5 H=gmail-smtp-in.l.google.com [2a00:1450:4010:c0e::1a]: SMTP timeout after sending data block (199774 bytes written): Connection timed out /var/log/exim4/mainlog:2022-04-27 07:50:10 1njbGU-005Lz8-RV H=gmail-smtp-in.l.google.com [74.125.131.26]: SMTP timeout after end of data (246239 bytes written): Connection timed out This is for both ipv4 and ipv6 connections, and to only Google mail servers, and only when delivering "large" messages (that are bigger than say about 100kb, though I haven't investigated fully the limits - short, text only is fine). Eventually, the messages do get through, but with delays of hours in some cases. As per other reports, delivery of the same mail to all other hosts works perfectly. This occurs both with firewall rules set to allow everything, as well as with a "normal" ruleset allowing: all OUTBOUND/FORWARD, all icmp INBOUND and all TCP INBOUND with ctstate RELATED,ESTABLISHED (as well as ports opened for relevant services). If I do: sysctl net.ipv4.tcp_window_scaling=0 , then everything works perfectly - with tcp_window_scaling=1, the issue is reproduced. I have a packet capture which is available here: https://tinyurl.com/742s855d The Session log from Exim in debug mode is here (with redacted hosts, addresses, etc) - the message was delivered to the server, and is being forwarded onto an email in a Google workspace account (following a forwarding rule in an aliases file) https://tinyurl.com/22nn887u Is it possible from these traces to pin down the issue at all and maybe come up with a workround (without having to turn off tcp_window_scaling) or a pointer as to where I need to formally raise a bug, and I'll be happy to do so! Thanks Graeme -- graeme at chromosphere dot co dot uk -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Taint checking and exim 4.96rc0
Hi, Dňa Sat, 30 Apr 2022 10:10:08 +0100 Jeremy Harris via Exim-users napísal: > On 30/04/2022 00:54, Slavko (tblt) via Exim-users wrote: > > Yes, as i wrote the same already some time ago, some generic > > ${detaint:...} expansion is missing. > > That would be instantly abused. I understand, but IMO exim's dev have not take responsibility behind stupid admins... But, please, how ${detaint:...} differs eg. from: ${lookup{...} lsearch*,ret=key{file_with_*_only}} The only differences i see are length of expansion to type and to be less effective (lookup will be done twice). > > verify recipients from my MX to my other MTA (where local DB are > > stored) by callout. But that doey not detaint recipient address nor > > domain, > > That's worthy of consideration; thank you for the idea. > Essentially, it would be treating a backend MTA as a trusted DB > for lookup. Nice, and please, can you consider in that "trusted DB" something, which can interpret deffer responses? I mean real 4xx responses, not eg. temporary network problem or so. For now i do not use this feature, as i cannot distinguish these two (network problem vs. response) states. But returning deffer from remote MTA is wanted, eg. for quotas. > Volunteers to work on any aspect, including redis support, are > always welcome. It really needs someone who uses it and finds > a facility lacking (meaning: not me). I do not afraid to help, but my C knowledge is less even than basic, and i feel too old (and not healthy) to start learn it now, especially when i evade C for years ;-) I do not consider itself as redis expert, but i use redis with MTA/MSA. I have to build own exim, to i can test these build-in redis lookups, but i stop to test it, when i realize, that boolean responses are not usable. There are relative simple workarounds eg. for EXISTS, where one can try to fetch key's value. But this prevents to test multiple keys at once and with more "complex" commands, e.g. SISMEMBER this can be more hard, as redis sets can be large, and fetching whole set (to check if something is in it) is not ideal and i use these sets eg. for per user country BL/WL on MSA shared with IMAP. These are not too large, but anyway. I feed redis's streams with some logging details, and (while not directly from exim) i use redis to limit/count access by its HLL with sliding window and some lua help. And i use its PUBSUB to distribute fail2ban blocks over multiple machines... Thus i consider redis as very useful to share state across multiple machines. Thus, if someone can do things in C, i can provide examples and test them and we can together get some results, from which can profit all. regards -- Slavko https://www.slavino.sk pgpvybRJ1jcHr.pgp Description: Digitálny podpis OpenPGP -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Taint checking and exim 4.96rc0
• Jeremy Harris via Exim-users [2022-04-29 23:40]: > > I'd welcome some generic way to untaint data. > > If you know of one which does not require a list > of known-good values, and is not trivially abusable > by blind copy-pasting of recipes found on random blogs - > I'm all ears. I think that something like ${untaint{$unsafe}{pattern}} could work. The reason for this is that taint checking is to prevent untrusted external data from being used in dangerous ways and thus cause troubles to the system where Exim is running. Pattern would be a regular expression, which should match the entire $unsafe string, or a *, which would match anything and which would imply that the user knows what they are doing. Whether or not to allow * could be a complike time flag. -- -- Kirill Miazine -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
Re: [exim] Taint checking and exim 4.96rc0
On 30/04/2022 00:54, Slavko (tblt) via Exim-users wrote: Yes, as i wrote the same already some time ago, some generic ${detaint:...} expansion is missing. That would be instantly abused. verify recipients from my MX to my other MTA (where local DB are stored) by callout. But that doey not detaint recipient address nor domain, That's worthy of consideration; thank you for the idea. Essentially, it would be treating a backend MTA as a trusted DB for lookup. As redis support is not full (and on Debian is missing at all) i use ${run ...} to communicate with redis and i afraid, that i will have problems to use it in new version, Volunteers to work on any aspect, including redis support, are always welcome. It really needs someone who uses it and finds a facility lacking (meaning: not me). In the meantime, the ${run } expansion is not taint-checked (and therefore still fertile ground for security breaches). -- Cheers, Jeremy -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/