Re: [exim] Google SMTP Timeouts on large mails

2022-04-30 Thread Jasen Betts via Exim-users
On 2022-04-29, Graeme Coates via Exim-users  wrote:
> Hi all,
>
>  
>
> I've seen this issue raised in:
>
>  
>
> https://lists.exim.org/lurker/message/20220216.071725.892984cd.en.html
>
> and
>
> https://lists.exim.org/lurker/message/20220313.200645.624cc373.en.html
>
>  
>
> but haven't seen a definite resolution as yet. 
>
>  
>
> As per other reports, I have a Debian Bullseye (11.3) system running Exim
> 4.94.2 #2. It is setup with virtual domains using dovecot for local delivery
> and aliases defined for some simple forwarding. I wasn't aware of any
> similar issue in Exim 4.92 (on Debian 10).  I see log reports similar to
> other reports - eg:
>
>  
>
> /var/log/exim4/mainlog:2022-04-27 07:47:30 1njbGQ-005LxL-M5
> H=gmail-smtp-in.l.google.com [2a00:1450:4010:c0e::1a]: SMTP timeout after
> sending data block (199774 bytes written): Connection timed out
>
> /var/log/exim4/mainlog:2022-04-27 07:50:10 1njbGU-005Lz8-RV
> H=gmail-smtp-in.l.google.com [74.125.131.26]: SMTP timeout after end of data
> (246239 bytes written): Connection timed out
>
>  
>
> This is for both ipv4 and ipv6 connections, and to only Google mail servers,
> and only when delivering "large" messages (that are bigger than say about
> 100kb, though I haven't investigated fully the limits - short, text only is
> fine). Eventually, the messages do get through, but with delays of hours in
> some cases. As per other reports, delivery of the same mail to all other
> hosts works perfectly. This occurs both with firewall rules set to allow
> everything, as well as with a "normal" ruleset allowing: all
> OUTBOUND/FORWARD,  all icmp INBOUND and all TCP INBOUND with ctstate
> RELATED,ESTABLISHED (as well as ports opened for relevant services). 
>
>  
>
> If I do:  sysctl net.ipv4.tcp_window_scaling=0 , then everything works
> perfectly - with tcp_window_scaling=1, the issue is reproduced. 
>
>  
>
> I have a packet capture which is available here: 
>
>  
>
> https://tinyurl.com/742s855d
>
>  
>
> The Session log from Exim in debug mode is here (with redacted hosts,
> addresses, etc) - the message was delivered to the server, and is being
> forwarded onto an email in a Google workspace account (following a
> forwarding rule in an aliases file)
>
>  
>
> https://tinyurl.com/22nn887u
>
>  
>
>  
>
> Is it possible from these traces to pin down the issue at all and maybe come
> up with a workround (without having to turn off tcp_window_scaling) or a
> pointer as to where I need to formally raise a bug, and I'll be happy to do
> so!

make sure that your DNS and return-path MX are working, we recently
had some sort of firewall issue that was unrelated to SMTP causing
timeouts on deliveries to gmail. removing the firewall rules cleared
it up.




-- 
  Jasen.

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Google SMTP Timeouts on large mails

2022-04-30 Thread Jeremy Harris via Exim-users

On 30/04/2022 17:43, Adam D. Barratt via Exim-users wrote:

This is likely to be the result of a known issue with Google's TCP Fast
Open setup - see e.g.
https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/


Always worth a try, but that blog description doesn't match
what the packet capture showed.
--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Google SMTP Timeouts on large mails

2022-04-30 Thread Adam D. Barratt via Exim-users
On Fri, 2022-04-29 at 10:56 +0100, Graeme Coates via Exim-users wrote:
> Hi all,
> 
>  
> 
> I've seen this issue raised in:
> 
>  
> 
> https://lists.exim.org/lurker/message/20220216.071725.892984cd.en.html
> 
> and
> 
> https://lists.exim.org/lurker/message/20220313.200645.624cc373.en.html
> 
>  
> 
> but haven't seen a definite resolution as yet. 
> 
>  
> 
> As per other reports, I have a Debian Bullseye (11.3) system running
> 

This is likely to be the result of a known issue with Google's TCP Fast
Open setup - see e.g. 
https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/

Exim 4.93 changed the default for the "hosts_try_fastopen" transport
option to be "*", and the default for the
net.ipv4.tcp_fastopen_blackhole_timeout_sec sysctl changed from 3600
(i.e. an hour) to 0 at some point between the kernel versions in Debian
buster (10) and bullseye (11).

A workaround is to add something similar to "hosts_try_fastopen = !
*.l.google.com" to your SMTP transports.

Regards,

Adam


-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Google SMTP Timeouts on large mails

2022-04-30 Thread Jeremy Harris via Exim-users

On 29/04/2022 10:56, Graeme Coates via Exim-users wrote:

a
pointer as to where I need to formally raise a bug, and I'll be happy to do
so!


I forgot to answer this point.

You could open one at bugs.exim.org just so the info doesn't
get lost.  But, currently, I don't think it's likely a bug
in Exim.

You should, I think, open a bug against Debian.
Include that packet capture; it's a red flag.
Feel free to include my analysis of it, too.


(I do hope you're not running any bolt-on "security"
products.  I've seen too many bugs associated with such.)
--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Google SMTP Timeouts on large mails

2022-04-30 Thread Jeremy Harris via Exim-users

On 29/04/2022 10:56, Graeme Coates via Exim-users wrote:

I have a packet capture which is available here:



https://tinyurl.com/742s855d


Thank you so much for gathering this.

It seems to show buggy behaviour in your Debian TCP implementation;
(or possibly software-firewall)
I don't see any way that Exim could be forcing this.

Specifically, we see (multiple) retries of a TCP segment for which
we saw both the original data and the ACK from the peer (Google).

There are no SACKs, despite further ACKs after the apparently missed one
(and it being a SACK-enabled connection).  This implies *no* ACKs
from that point on were received by the TCP code.


We can't tell exactly what data was involved, lacking the TLS session
keys, but given the above it's probably moot. If you care to investigate
that, see the text around "Add SSLKEYLOGFILE to keep_environment in the exim 
config"
and feed the resulting file to wireshark.


The Session log from Exim in debug mode is here (with redacted hosts,
addresses, etc) - the message was delivered to the server, and is being
forwarded onto an email in a Google workspace account (following a
forwarding rule in an aliases file)



https://tinyurl.com/22nn887u


It all looks reasonable there, up to the point that the GnuTLS library
tells us "The TLS connection was non-properly terminated." - which would
follow on from the pcap-observed problem at the TCP level.


Is it possible from these traces to pin down the issue at all and maybe come
up with a workround (without having to turn off tcp_window_scaling) or a
pointer as to where I need to formally raise a bug, and I'll be happy to do
so!


You already mentioned IPv4/6 makes no difference.
You could try disabling TFO (but I think it's unlikely to help),
TLSv1.3 (ditto), CHNNKING (more possible, but again it's entirely the
wrong protocol layer), PIPELINING (ditto).

The problem going away when you disable TCP window scaling is interesting,
but it might just be shifting the point it bites to somewhere else
in other size flows.
Exim has no facilities to set a small transmit socket buffer size (which
would have the same effect, and not massacre your performance on other
networking users), I'm afraid.


I guess, if ACKs are not being seen by your TCP endpoint, the socket will
still be holding un-ack'd data in the transmit queue.  If you can catch that
(use "ss -panmit dport = 25")   it would confirm my interpretation.

If it's the firewall that's dropping inbound TCP ACK packets, I guess there's
the possibility of configuring it to log drops.
--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Taint checking and exim 4.96rc0

2022-04-30 Thread James via Exim-users

On 29/04/2022 20:07, Heiko Schlittermann via Exim-users wrote:


Do we have *new* taintchecks that break
configurations that were considered secure with 4.95?


I has a hash_32_64 of data, accepted in 4.95, requires quote_pgsql with 
4.96.


Does a hash pass a taint?  Whatever, easily adjusted in my config.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


[exim] Google SMTP Timeouts on large mails

2022-04-30 Thread Graeme Coates via Exim-users
Hi all,

 

I've seen this issue raised in:

 

https://lists.exim.org/lurker/message/20220216.071725.892984cd.en.html

and

https://lists.exim.org/lurker/message/20220313.200645.624cc373.en.html

 

but haven't seen a definite resolution as yet. 

 

As per other reports, I have a Debian Bullseye (11.3) system running Exim
4.94.2 #2. It is setup with virtual domains using dovecot for local delivery
and aliases defined for some simple forwarding. I wasn't aware of any
similar issue in Exim 4.92 (on Debian 10).  I see log reports similar to
other reports - eg:

 

/var/log/exim4/mainlog:2022-04-27 07:47:30 1njbGQ-005LxL-M5
H=gmail-smtp-in.l.google.com [2a00:1450:4010:c0e::1a]: SMTP timeout after
sending data block (199774 bytes written): Connection timed out

/var/log/exim4/mainlog:2022-04-27 07:50:10 1njbGU-005Lz8-RV
H=gmail-smtp-in.l.google.com [74.125.131.26]: SMTP timeout after end of data
(246239 bytes written): Connection timed out

 

This is for both ipv4 and ipv6 connections, and to only Google mail servers,
and only when delivering "large" messages (that are bigger than say about
100kb, though I haven't investigated fully the limits - short, text only is
fine). Eventually, the messages do get through, but with delays of hours in
some cases. As per other reports, delivery of the same mail to all other
hosts works perfectly. This occurs both with firewall rules set to allow
everything, as well as with a "normal" ruleset allowing: all
OUTBOUND/FORWARD,  all icmp INBOUND and all TCP INBOUND with ctstate
RELATED,ESTABLISHED (as well as ports opened for relevant services). 

 

If I do:  sysctl net.ipv4.tcp_window_scaling=0 , then everything works
perfectly - with tcp_window_scaling=1, the issue is reproduced. 

 

I have a packet capture which is available here: 

 

https://tinyurl.com/742s855d

 

The Session log from Exim in debug mode is here (with redacted hosts,
addresses, etc) - the message was delivered to the server, and is being
forwarded onto an email in a Google workspace account (following a
forwarding rule in an aliases file)

 

https://tinyurl.com/22nn887u

 

 

Is it possible from these traces to pin down the issue at all and maybe come
up with a workround (without having to turn off tcp_window_scaling) or a
pointer as to where I need to formally raise a bug, and I'll be happy to do
so!

 

Thanks

 

Graeme

-- 

graeme at chromosphere dot co dot uk

 

 

 

 

 

 

 

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Taint checking and exim 4.96rc0

2022-04-30 Thread Slavko via Exim-users
Hi,

Dňa Sat, 30 Apr 2022 10:10:08 +0100 Jeremy Harris via Exim-users
 napísal:

> On 30/04/2022 00:54, Slavko (tblt) via Exim-users wrote:
> > Yes, as i wrote the same already some time ago, some generic
> > ${detaint:...} expansion is missing.  
> 
> That would be instantly abused.

I understand, but IMO exim's dev have not take responsibility behind
stupid admins... But, please, how ${detaint:...} differs eg. from:

${lookup{...} lsearch*,ret=key{file_with_*_only}}

The only differences i see are length of expansion to type and to be
less effective (lookup will be done twice).

> > verify recipients from my MX to my other MTA (where local DB are
> > stored) by callout. But that doey not detaint recipient address nor
> > domain,  
> 
> That's worthy of consideration; thank you for the idea.
> Essentially, it would be treating a backend MTA as a trusted DB
> for lookup.

Nice, and please, can you consider in that "trusted DB" something,
which can interpret deffer responses?

I mean real 4xx responses, not eg. temporary network problem or so. For
now i do not use this feature, as i cannot distinguish these two
(network problem vs. response) states. But returning deffer from
remote MTA is wanted, eg. for quotas.

> Volunteers to work on any aspect, including redis support, are
> always welcome.  It really needs someone who uses it and finds
> a facility lacking (meaning: not me).

I do not afraid to help, but my C knowledge is less even than basic,
and i feel too old (and not healthy) to start learn it now, especially
when i evade C for years ;-)

I do not consider itself as redis expert, but i use redis with MTA/MSA.
I have to build own exim, to i can test these build-in redis lookups,
but i stop to test it, when i realize, that boolean responses are not
usable. There are relative simple workarounds eg. for EXISTS, where one
can try to fetch key's value. But this prevents to test multiple keys
at once and with more "complex" commands, e.g. SISMEMBER this can be
more hard, as redis sets can be large, and fetching whole set (to check
if something is in it) is not ideal and i use these sets eg. for per
user country BL/WL on MSA shared with IMAP. These are not too large,
but anyway.

I feed redis's streams with some logging details, and (while not
directly from exim) i use redis to limit/count access by its HLL
with sliding window and some lua help. And i use its PUBSUB to
distribute fail2ban blocks over multiple machines... Thus i consider
redis as very useful to share state across multiple machines.

Thus, if someone can do things in C, i can provide examples and
test them and we can together get some results, from which can profit
all.

regards

-- 
Slavko
https://www.slavino.sk


pgpvybRJ1jcHr.pgp
Description: Digitálny podpis OpenPGP
-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Taint checking and exim 4.96rc0

2022-04-30 Thread Kirill Miazine via Exim-users
• Jeremy Harris via Exim-users [2022-04-29 23:40]:
> > I'd welcome some generic way to untaint data.
> 
> If you know of one which does not require a list
> of known-good values, and is not trivially abusable
> by blind copy-pasting of recipes found on random blogs -
> I'm all ears.

I think that something like ${untaint{$unsafe}{pattern}} could work.

The reason for this is that taint checking is to prevent untrusted
external data from being used in dangerous ways and thus cause troubles
to the system where Exim is running. Pattern would be a regular
expression, which should match the entire $unsafe string, or a *, which
would match anything and which would imply that the user knows what they
are doing. Whether or not to allow * could be a complike time flag.

-- 
-- Kirill Miazine 

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Taint checking and exim 4.96rc0

2022-04-30 Thread Jeremy Harris via Exim-users

On 30/04/2022 00:54, Slavko (tblt) via Exim-users wrote:

Yes, as i wrote the same already some time ago, some generic
${detaint:...} expansion is missing.


That would be instantly abused.


verify recipients from my MX to my other MTA (where local DB are
stored) by callout. But that doey not detaint recipient address nor
domain,


That's worthy of consideration; thank you for the idea.
Essentially, it would be treating a backend MTA as a trusted DB
for lookup.


As redis support is not full (and on Debian is missing at all) i use
${run ...} to communicate with redis and i afraid, that i will have
problems to use it in new version,


Volunteers to work on any aspect, including redis support, are
always welcome.  It really needs someone who uses it and finds
a facility lacking (meaning: not me).

In the meantime, the ${run } expansion is not taint-checked
(and therefore still fertile ground for security breaches).

--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/