Re: haproxy doesn't bring failed server up

2019-10-06 Thread rihad

On 10/07/2019 03:11 AM, Cyril Bonté wrote:

Hi,

Le 06/10/2019 à 09:19, rihad a écrit :
Hi, all. This annoying bug can be experienced in 1.7-2.0 servers 
(while 1.9 has added another bug of high CPU utilization - unrelated 
to this). In essence, once an external server that we forward 
internal requests to stops responding for some time and comes back to 
life a bit later, more often than not haproxy can no longer reach it.


Without any other details (logs would be helpul), I tend to think 
there's no bug here, but a configuration issue.


By default, haproxy resolves host names once, on start up. As you are 
using a host name to delcare the smtp server, which oftenly updates 
its IPs :



Configuration is very simple

[...]
 server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 
30s fall 1440 rise 1




That's it. if email-smtp.us-west-2.amazonaws.com:587 fails 
intermittently and the downtime lasts more than a few 30 sec checks, 
it can then no longer be accessed via 127.0.0.2:2588 even if the 
external servers resumes normal operation, and nothing short of a 
reload (-sf) fixes the problem.


I guess that when it happens, email-smtp.us-west-2.amazonaws.com has a 
new pool of IP addresses and haproxy can't connect to the old resolved 
one. You should have a look to the documentation chapter "Server IP 
address resolution using DNS" :

https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.3



Thanks! But according to the manual, shouldn't haproxy re-resolve AWS 
server name regardless of its resolver settings?



A few other events can trigger a name resolution at run time:
  - when a server's health check ends up in a connection timeout: this may be
because the server has a new IP address. So we need to trigger a name
resolution to know this new IP.



Re: haproxy doesn't bring failed server up

2019-10-06 Thread Cyril Bonté

Hi,

Le 06/10/2019 à 09:19, rihad a écrit :
Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while 
1.9 has added another bug of high CPU utilization - unrelated to this). 
In essence, once an external server that we forward internal requests to 
stops responding for some time and comes back to life a bit later, more 
often than not haproxy can no longer reach it.


Without any other details (logs would be helpul), I tend to think 
there's no bug here, but a configuration issue.


By default, haproxy resolves host names once, on start up. As you are 
using a host name to delcare the smtp server, which oftenly updates its 
IPs :



Configuration is very simple

[...]
     server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 
30s fall 1440 rise 1




That's it. if email-smtp.us-west-2.amazonaws.com:587 fails 
intermittently and the downtime lasts more than a few 30 sec checks, it 
can then no longer be accessed via 127.0.0.2:2588 even if the external 
servers resumes normal operation, and nothing short of a reload (-sf) 
fixes the problem.


I guess that when it happens, email-smtp.us-west-2.amazonaws.com has a 
new pool of IP addresses and haproxy can't connect to the old resolved 
one. You should have a look to the documentation chapter "Server IP 
address resolution using DNS" :

https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.3

--
Cyril Bonté



haproxy doesn't bring failed server up

2019-10-06 Thread rihad
Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while 
1.9 has added another bug of high CPU utilization - unrelated to this). 
In essence, once an external server that we forward internal requests to 
stops responding for some time and comes back to life a bit later, more 
often than not haproxy can no longer reach it.


Configuration is very simple


global
maxconn 16384
daemon
nbproc 1
user nobody
group nobody

log /var/run/log local0

defaults
    retries 3
    timeout connect 5000
    timeout client 360
    timeout server 360
    log global
    option log-health-checks


listen amazon_ses
    bind 127.0.0.2:2588
    mode tcp
    no option http-server-close
    default_backend bk_amazon_ses

backend bk_amazon_ses
    mode tcp
    no option http-server-close
    timeout connect 5s
    server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 
30s fall 1440 rise 1



That's it. if email-smtp.us-west-2.amazonaws.com:587 fails 
intermittently and the downtime lasts more than a few 30 sec checks, it 
can then no longer be accessed via 127.0.0.2:2588 even if the external 
servers resumes normal operation, and nothing short of a reload (-sf) 
fixes the problem.