Re: haproxy doesn't bring failed server up

2019-10-29 Thread Baptiste
On Mon, Oct 7, 2019 at 9:09 PM Lukas Tribus  wrote:

> Hello,
>
> On Mon, Oct 7, 2019 at 10:00 AM rihad  wrote:
> >
> > BTW, all these resolver hold settings are a bit confusing, is there a
> way to tell
> > haproxy to rely on the TTL it gets from DNS servers/resolvers? It seems
> to be
> > relying on some hard-coded default values instead.
>
> I don't think TTL is currently considered, no. How long it will cache
> is configurable and defaults to 10 seconds ("valid"). Because you'd
> use very low values here anyway (and have your recursive resolver do
> proper TTL considering caching), I don't believe there is a huge
> impact because of this.
>
> But I agree, it would be better to consider the TTL.
>
>
> Lukas
>
>

Hi,

That is correct, the runtime resolver does not follow up the TTL.
It's on purpose and by design to allow the admin themselves to decide when
they want to trigger a new request and to avoid some DNS relay would
rewrite TTLs to very long value (my ISP enforce anything lower than 20
minutes to 20 minutes).

We could add on the roadmap to support TTL, as an option, but I need first
to understand the use case.

Baptiste


Re: haproxy doesn't bring failed server up

2019-10-07 Thread Lukas Tribus
Hello,

On Mon, Oct 7, 2019 at 10:00 AM rihad  wrote:
>
> BTW, all these resolver hold settings are a bit confusing, is there a way to 
> tell
> haproxy to rely on the TTL it gets from DNS servers/resolvers? It seems to be
> relying on some hard-coded default values instead.

I don't think TTL is currently considered, no. How long it will cache
is configurable and defaults to 10 seconds ("valid"). Because you'd
use very low values here anyway (and have your recursive resolver do
proper TTL considering caching), I don't believe there is a huge
impact because of this.

But I agree, it would be better to consider the TTL.


Lukas



Re: haproxy doesn't bring failed server up

2019-10-07 Thread rihad
BTW, all these resolver hold settings are a bit confusing, is there a 
way to tell haproxy to rely on the TTL it gets from DNS 
servers/resolvers? It seems to be relying on some hard-coded default 
values instead.


*hold 
* 
 


Defines  during which the last name resolution should be kept based
on last resolution 
   : last name resolution status. Acceptable values are "nx",
 "other", "refused", 
"timeout", "valid", "obsolete".
   : interval between two successive name resolution when the last
 answer was in . It follows the HAProxy time format.
  is in milliseconds by default.

Default value is 10s for "valid", 0s for "obsolete" and 30s for others.



Re: haproxy doesn't bring failed server up

2019-10-07 Thread rihad

On 10/07/2019 11:02 AM, Lukas Tribus wrote:

Hello,

On Mon, Oct 7, 2019 at 6:30 AM rihad  wrote:

Thanks! But according to the manual, shouldn't haproxy re-resolve AWS server 
name regardless of its resolver settings?


A few other events can trigger a name resolution at run time:
   - when a server's health check ends up in a connection timeout: this may be
 because the server has a new IP address. So we need to trigger a name
 resolution to know this new IP.

No. That is when the resolver is actually configured. Please read from
the same section:
Oh, ok, I must have misunderstood "A few other events *can trigger* a 
name resolution at run time"
But that doesn't mean it will necessarily succeed in case no resolvers 
have been configured )
I should say that no errors or warnings regarding failed name resolution 
even appeared in syslog which left me puzzled about what the root cause 
could be.


Anyway, I've added this:

resolvers mydns
# this sugar seems to require haproxy-2.0:
#    parse-resolv-conf
    nameserver mydns0 127.0.0.1:53


hoping it will help.

Thanks!


HAProxy allows using a host name on the server line to retrieve its IP address
using name servers. By default, HAProxy resolves the name when parsing the
configuration file, at startup and cache the result for the process' life.
This is not sufficient in some cases, such as in Amazon where a server's IP
can change after a reboot or an ELB Virtual IP can change based on current
workload.
This chapter describes how HAProxy can be configured to process server's name
resolution at run time.



Regards,
Lukas
.





Re: haproxy doesn't bring failed server up

2019-10-07 Thread Lukas Tribus
Hello,

On Mon, Oct 7, 2019 at 6:30 AM rihad  wrote:
> Thanks! But according to the manual, shouldn't haproxy re-resolve AWS server 
> name regardless of its resolver settings?
>
>
> A few other events can trigger a name resolution at run time:
>   - when a server's health check ends up in a connection timeout: this may be
> because the server has a new IP address. So we need to trigger a name
> resolution to know this new IP.

No. That is when the resolver is actually configured. Please read from
the same section:

> HAProxy allows using a host name on the server line to retrieve its IP address
> using name servers. By default, HAProxy resolves the name when parsing the
> configuration file, at startup and cache the result for the process' life.
> This is not sufficient in some cases, such as in Amazon where a server's IP
> can change after a reboot or an ELB Virtual IP can change based on current
> workload.
> This chapter describes how HAProxy can be configured to process server's name
> resolution at run time.



Regards,
Lukas



Re: haproxy doesn't bring failed server up

2019-10-06 Thread rihad

On 10/07/2019 03:11 AM, Cyril Bonté wrote:

Hi,

Le 06/10/2019 à 09:19, rihad a écrit :
Hi, all. This annoying bug can be experienced in 1.7-2.0 servers 
(while 1.9 has added another bug of high CPU utilization - unrelated 
to this). In essence, once an external server that we forward 
internal requests to stops responding for some time and comes back to 
life a bit later, more often than not haproxy can no longer reach it.


Without any other details (logs would be helpul), I tend to think 
there's no bug here, but a configuration issue.


By default, haproxy resolves host names once, on start up. As you are 
using a host name to delcare the smtp server, which oftenly updates 
its IPs :



Configuration is very simple

[...]
 server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 
30s fall 1440 rise 1




That's it. if email-smtp.us-west-2.amazonaws.com:587 fails 
intermittently and the downtime lasts more than a few 30 sec checks, 
it can then no longer be accessed via 127.0.0.2:2588 even if the 
external servers resumes normal operation, and nothing short of a 
reload (-sf) fixes the problem.


I guess that when it happens, email-smtp.us-west-2.amazonaws.com has a 
new pool of IP addresses and haproxy can't connect to the old resolved 
one. You should have a look to the documentation chapter "Server IP 
address resolution using DNS" :

https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.3



Thanks! But according to the manual, shouldn't haproxy re-resolve AWS 
server name regardless of its resolver settings?



A few other events can trigger a name resolution at run time:
  - when a server's health check ends up in a connection timeout: this may be
because the server has a new IP address. So we need to trigger a name
resolution to know this new IP.



Re: haproxy doesn't bring failed server up

2019-10-06 Thread Cyril Bonté

Hi,

Le 06/10/2019 à 09:19, rihad a écrit :
Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while 
1.9 has added another bug of high CPU utilization - unrelated to this). 
In essence, once an external server that we forward internal requests to 
stops responding for some time and comes back to life a bit later, more 
often than not haproxy can no longer reach it.


Without any other details (logs would be helpul), I tend to think 
there's no bug here, but a configuration issue.


By default, haproxy resolves host names once, on start up. As you are 
using a host name to delcare the smtp server, which oftenly updates its 
IPs :



Configuration is very simple

[...]
     server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 
30s fall 1440 rise 1




That's it. if email-smtp.us-west-2.amazonaws.com:587 fails 
intermittently and the downtime lasts more than a few 30 sec checks, it 
can then no longer be accessed via 127.0.0.2:2588 even if the external 
servers resumes normal operation, and nothing short of a reload (-sf) 
fixes the problem.


I guess that when it happens, email-smtp.us-west-2.amazonaws.com has a 
new pool of IP addresses and haproxy can't connect to the old resolved 
one. You should have a look to the documentation chapter "Server IP 
address resolution using DNS" :

https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.3

--
Cyril Bonté



haproxy doesn't bring failed server up

2019-10-06 Thread rihad
Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while 
1.9 has added another bug of high CPU utilization - unrelated to this). 
In essence, once an external server that we forward internal requests to 
stops responding for some time and comes back to life a bit later, more 
often than not haproxy can no longer reach it.


Configuration is very simple


global
maxconn 16384
daemon
nbproc 1
user nobody
group nobody

log /var/run/log local0

defaults
    retries 3
    timeout connect 5000
    timeout client 360
    timeout server 360
    log global
    option log-health-checks


listen amazon_ses
    bind 127.0.0.2:2588
    mode tcp
    no option http-server-close
    default_backend bk_amazon_ses

backend bk_amazon_ses
    mode tcp
    no option http-server-close
    timeout connect 5s
    server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 
30s fall 1440 rise 1



That's it. if email-smtp.us-west-2.amazonaws.com:587 fails 
intermittently and the downtime lasts more than a few 30 sec checks, it 
can then no longer be accessed via 127.0.0.2:2588 even if the external 
servers resumes normal operation, and nothing short of a reload (-sf) 
fixes the problem.