Re: haproxy doesn't bring failed server up
On 10/07/2019 03:11 AM, Cyril Bonté wrote: Hi, Le 06/10/2019 à 09:19, rihad a écrit : Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while 1.9 has added another bug of high CPU utilization - unrelated to this). In essence, once an external server that we forward internal requests to stops responding for some time and comes back to life a bit later, more often than not haproxy can no longer reach it. Without any other details (logs would be helpul), I tend to think there's no bug here, but a configuration issue. By default, haproxy resolves host names once, on start up. As you are using a host name to delcare the smtp server, which oftenly updates its IPs : Configuration is very simple [...] server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 30s fall 1440 rise 1 That's it. if email-smtp.us-west-2.amazonaws.com:587 fails intermittently and the downtime lasts more than a few 30 sec checks, it can then no longer be accessed via 127.0.0.2:2588 even if the external servers resumes normal operation, and nothing short of a reload (-sf) fixes the problem. I guess that when it happens, email-smtp.us-west-2.amazonaws.com has a new pool of IP addresses and haproxy can't connect to the old resolved one. You should have a look to the documentation chapter "Server IP address resolution using DNS" : https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.3 Thanks! But according to the manual, shouldn't haproxy re-resolve AWS server name regardless of its resolver settings? A few other events can trigger a name resolution at run time: - when a server's health check ends up in a connection timeout: this may be because the server has a new IP address. So we need to trigger a name resolution to know this new IP.
Re: haproxy doesn't bring failed server up
Hi, Le 06/10/2019 à 09:19, rihad a écrit : Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while 1.9 has added another bug of high CPU utilization - unrelated to this). In essence, once an external server that we forward internal requests to stops responding for some time and comes back to life a bit later, more often than not haproxy can no longer reach it. Without any other details (logs would be helpul), I tend to think there's no bug here, but a configuration issue. By default, haproxy resolves host names once, on start up. As you are using a host name to delcare the smtp server, which oftenly updates its IPs : Configuration is very simple [...] server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 30s fall 1440 rise 1 That's it. if email-smtp.us-west-2.amazonaws.com:587 fails intermittently and the downtime lasts more than a few 30 sec checks, it can then no longer be accessed via 127.0.0.2:2588 even if the external servers resumes normal operation, and nothing short of a reload (-sf) fixes the problem. I guess that when it happens, email-smtp.us-west-2.amazonaws.com has a new pool of IP addresses and haproxy can't connect to the old resolved one. You should have a look to the documentation chapter "Server IP address resolution using DNS" : https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.3 -- Cyril Bonté
haproxy doesn't bring failed server up
Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while 1.9 has added another bug of high CPU utilization - unrelated to this). In essence, once an external server that we forward internal requests to stops responding for some time and comes back to life a bit later, more often than not haproxy can no longer reach it. Configuration is very simple global maxconn 16384 daemon nbproc 1 user nobody group nobody log /var/run/log local0 defaults retries 3 timeout connect 5000 timeout client 360 timeout server 360 log global option log-health-checks listen amazon_ses bind 127.0.0.2:2588 mode tcp no option http-server-close default_backend bk_amazon_ses backend bk_amazon_ses mode tcp no option http-server-close timeout connect 5s server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 30s fall 1440 rise 1 That's it. if email-smtp.us-west-2.amazonaws.com:587 fails intermittently and the downtime lasts more than a few 30 sec checks, it can then no longer be accessed via 127.0.0.2:2588 even if the external servers resumes normal operation, and nothing short of a reload (-sf) fixes the problem.