Re: problem in 1.8 with hosts going out of service

2018-01-25 Thread Willy Tarreau
On Thu, Jan 25, 2018 at 11:28:38AM +0100, Christopher Faulet wrote:
> Thanks for tests guys. Willy, you can merge the patch.

Applied, thanks!
Willy



Re: problem in 1.8 with hosts going out of service

2018-01-25 Thread Christopher Faulet

Le 24/01/2018 à 23:54, Paul Lockaby a écrit :

This patch works for me. Thank you!



On Jan 24, 2018, at 1:02 PM, Christopher Faulet  wrote:

Le 24/01/2018 à 17:21, Paul Lockaby a écrit :

Sorry, I know this list is super busy and that there are a number of other more 
important issues that need to be worked through but I'm hoping one of the 
maintainers has been able to confirm this bug?


Hi,

Sorry Paul. As you said, we are pretty busy. And you're right to ping us. So, I 
can confirm the bug. It is a bug on threads, a deadlock, because of a typo.

Could you check the attached patch to confirm it fixes your problem ?



Hi,

Thanks for tests guys. Willy, you can merge the patch.

--
Christopher Faulet



Re: problem in 1.8 with hosts going out of service

2018-01-24 Thread Paul Lockaby
This patch works for me. Thank you!


> On Jan 24, 2018, at 1:02 PM, Christopher Faulet  wrote:
> 
> Le 24/01/2018 à 17:21, Paul Lockaby a écrit :
>> Sorry, I know this list is super busy and that there are a number of other 
>> more important issues that need to be worked through but I'm hoping one of 
>> the maintainers has been able to confirm this bug?
> 
> Hi,
> 
> Sorry Paul. As you said, we are pretty busy. And you're right to ping us. So, 
> I can confirm the bug. It is a bug on threads, a deadlock, because of a typo.
> 
> Could you check the attached patch to confirm it fixes your problem ?
> 
> Thanks,
> -- 
> Christopher Faulet
> <0001-BUG-MEDIUM-threads-server-Fix-deadlock-in-srv_set_st.patch>



Re: problem in 1.8 with hosts going out of service

2018-01-24 Thread PiBa-NL

Hi Christopher,

Patch seems to work for me.
Maybe Paul can confirm as well.

Regards,
PiBa-NL / Pieter

Op 24-1-2018 om 22:02 schreef Christopher Faulet:

Le 24/01/2018 à 17:21, Paul Lockaby a écrit :
Sorry, I know this list is super busy and that there are a number of 
other more important issues that need to be worked through but I'm 
hoping one of the maintainers has been able to confirm this bug?




Hi,

Sorry Paul. As you said, we are pretty busy. And you're right to ping 
us. So, I can confirm the bug. It is a bug on threads, a deadlock, 
because of a typo.


Could you check the attached patch to confirm it fixes your problem ?

Thanks,






Re: problem in 1.8 with hosts going out of service (alive.txt+404+track)

2018-01-24 Thread PiBa-NL

Hi Paul, List,

Op 24-1-2018 om 17:21 schreef Paul Lockaby:

Sorry, I know this list is super busy and that there are a number of other more 
important issues that need to be worked through but I'm hoping one of the 
maintainers has been able to confirm this bug?
I can reproduce it indeed with your config on 1.8.3, cpu usage go's 100% 
and stats stops responding when the alive.txt is removed.
When a backend go's down completely the track works as intended though 
(stats shows both server marked down).
Hope this and below added info / smaller config helps someone track the 
issue down further in the code.


Thanks,
-Paul


On Jan 17, 2018, at 10:27 AM, Paul Lockaby  wrote:

Ok I've tracked this problem down specifically to the usage of check tracking.

That is to say, the backend "example-api" is set to track the backend 
"example-http". When that tracking is enabled and one of the servers in the backend goes 
down then all of haproxy goes down and never recovers.

So this works:
server myhost myhost.example.com:8445 ssl ca-file 
/usr/local/ssl/certs/cacerts.cert

But this does not:
server myhost myhost.example.com:8445 track example-http/myhost ssl ca-file 
/usr/local/ssl/certs/cacerts.cert

This is definitely a regression from 1.7 because I used this feature in 1.7 
without issue.



It seems a combination of 3 combined features to trigger this issue:
    option httpchk GET /alive.txt + http-check disable-on-404 + track

Backtrace keeps showing this:
(gdb) bt
#0  0x0046b59f in srv_set_stopping ()
#1  0x004a3057 in ?? ()
#2  0x004f0eaf in process_runnable_tasks ()
#3  0x004aa13c in ?? ()
#4  0x004a9a16 in main ()

I could reduce the config to this:

frontend stats-frontend
    bind *:2999
    mode http
    log global
    stats enable
    stats uri /haproxy

frontend stats-frontend
    bind *:2999
    mode http
    log global
    stats enable
    stats uri /haproxy

frontend secured
    bind *:8080
    mode http
    acl request_api hdr_beg(Host) -i api.
    use_backend example-api if request_api
    default_backend example-http

backend example-http
    mode http
    option httpchk GET /haproxy/alive.txt
    http-check disable-on-404
    server myhost vhost1.pfs.local:302 check

backend example-api
    mode http
    option httpchk GET /haproxy/alive.txt
    http-check disable-on-404
    server myhost vhost1.pfs.local:303 track example-http/myhost

Regards,

PiBa-NL / Pieter




Re: problem in 1.8 with hosts going out of service

2018-01-24 Thread Christopher Faulet

Le 24/01/2018 à 17:21, Paul Lockaby a écrit :

Sorry, I know this list is super busy and that there are a number of other more 
important issues that need to be worked through but I'm hoping one of the 
maintainers has been able to confirm this bug?



Hi,

Sorry Paul. As you said, we are pretty busy. And you're right to ping 
us. So, I can confirm the bug. It is a bug on threads, a deadlock, 
because of a typo.


Could you check the attached patch to confirm it fixes your problem ?

Thanks,
--
Christopher Faulet
>From 6436e9d934045c7cd9c41bffa036eb6213ff21ee Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Wed, 24 Jan 2018 21:49:41 +0100
Subject: [PATCH] BUG/MEDIUM: threads/server: Fix deadlock in
 srv_set_stopping/srv_set_admin_flag

Because of a typo (HA_SPIN_LOCK instead of HA_SPIN_UNLOCK), there is a deadlock
in srv_set_stopping and srv_set_admin_flag when there is at least one trackers.

This patch must be backported in 1.8.
---
 src/server.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/server.c b/src/server.c
index 3901e7d8..07a6603a 100644
--- a/src/server.c
+++ b/src/server.c
@@ -976,7 +976,7 @@ void srv_set_stopping(struct server *s, const char *reason, struct check *check)
 	for (srv = s->trackers; srv; srv = srv->tracknext) {
 		HA_SPIN_LOCK(SERVER_LOCK, >lock);
 		srv_set_stopping(srv, NULL, NULL);
-		HA_SPIN_LOCK(SERVER_LOCK, >lock);
+		HA_SPIN_UNLOCK(SERVER_LOCK, >lock);
 	}
 }
 
@@ -1019,7 +1019,7 @@ void srv_set_admin_flag(struct server *s, enum srv_admin mode, const char *cause
 	for (srv = s->trackers; srv; srv = srv->tracknext) {
 		HA_SPIN_LOCK(SERVER_LOCK, >lock);
 		srv_set_admin_flag(srv, mode, cause);
-		HA_SPIN_LOCK(SERVER_LOCK, >lock);
+		HA_SPIN_UNLOCK(SERVER_LOCK, >lock);
 	}
 }
 
-- 
2.14.3



Re: problem in 1.8 with hosts going out of service

2018-01-24 Thread Paul Lockaby
Sorry, I know this list is super busy and that there are a number of other more 
important issues that need to be worked through but I'm hoping one of the 
maintainers has been able to confirm this bug?

Thanks,
-Paul

> On Jan 17, 2018, at 10:27 AM, Paul Lockaby  wrote:
> 
> Ok I've tracked this problem down specifically to the usage of check tracking.
> 
> That is to say, the backend "example-api" is set to track the backend 
> "example-http". When that tracking is enabled and one of the servers in the 
> backend goes down then all of haproxy goes down and never recovers.
> 
> So this works:
> 
>server myhost myhost.example.com:8445 ssl ca-file 
> /usr/local/ssl/certs/cacerts.cert
> 
> 
> But this does not:
> 
>server myhost myhost.example.com:8445 track example-http/myhost ssl 
> ca-file /usr/local/ssl/certs/cacerts.cert
> 
> 
> This is definitely a regression from 1.7 because I used this feature in 1.7 
> without issue.
> 
> 
>> On Jan 16, 2018, at 10:36 PM, Paul Lockaby  wrote:
>> 
>> I'm experiencing a problem that I can't diagnose but I can recreate pretty 
>> consistently. I have a single server that responds for example.com and 
>> api.example.com and it runs haproxy. All the names run through an SSL front 
>> door but an ACL makes it such that requests for example.com get sent to 8443 
>> where Apache runs and requests for api.example.com get sent to 8445 where 
>> the same instance of haproxy runs and does further examination of the 
>> request and sends it to an application server running on localhost.
>> 
>> This configuration works great except when I take a server out of the 
>> rotation by disabling it with disable-on-404. As soon as I take any server 
>> out of the rotation, haproxy completely stops responding to ANY requests for 
>> ANY backend even things that aren't part of the group such as the stats 
>> backend and frontend. If I put the server back in to service haproxy does 
>> not recover. I must restart haproxy on all hosts to recover. Nothing shows 
>> up in the logs and I can't figure out how to debug it such that I can 
>> provide more information but it's very consistently reproducible using the 
>> configuration below. I am running 1.8.3 and I have not tried this on 1.7 or 
>> earlier versions of 1.8.
>> 
>> Thanks for your help.
>> -Paul
>> 
>> 
>> 
>> global
>>   log /dev/log local0
>>   user nobody
>>   group nobody
>>   tune.ssl.default-dh-param 2048
>>   stats socket /var/run/haproxy.sock user nobody group nobody
>>   daemon
>> 
>> defaults
>>   timeout connect 5000ms
>>   timeout client 60ms
>>   timeout server 60ms
>> 
>>   option httplog
>>   option forwardfor
>>   option http-server-close
>>   option contstats
>> 
>> frontend stats-frontend
>>   bind *:2999
>>   mode http
>>   log global
>>   stats enable
>>   stats uri /haproxy
>> 
>> backend stats-backend
>>   mode http
>>   log global
>>   server stats /var/run/haproxy.sock check
>> 
>> frontend secured
>>   # get the list of certificate options from a list in a file
>>   bind *:443 ssl crt-list /srv/haproxy/certificates.lst
>>   mode http
>>   log global
>> 
>>   # tell backend connections what our ssl client cn is
>>   http-request set-header X-SSL-Client-Verify %[ssl_c_verify]
>>   http-request set-header X-SSL-Client-DN %{+Q}[ssl_c_s_dn]
>>   http-request set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)]
>>   http-request set-header X-SSL-Issuer-DN %{+Q}[ssl_c_i_dn]
>>   http-request set-header X-SSL-Issuer-CN %{+Q}[ssl_c_i_dn(cn)]
>> 
>>   acl server-status path_beg /server-
>>   use_backend bogus-http if server-status
>> 
>>   # connection requests for apis go to the api backends
>>   acl request_api hdr_beg(Host) -i api.
>>   use_backend example-api if request_api
>> 
>>   default_backend example-http
>> 
>> backend example-http
>>   mode http
>>   log global
>>   balance source
>>   hash-type consistent
>>   option httpchk GET /haproxy/alive.txt
>>   http-check disable-on-404
>>   server myhost myhost.example.com:8443 check ssl ca-file 
>> /usr/local/ssl/certs/cacerts.cert
>> 
>> backend bogus-http
>>   mode http
>>   errorfile 503 /netops/www/haproxy/403.http
>> 
>> backend example-api
>>   mode http
>>   log global
>>   balance roundrobin
>>   option httpchk GET /haproxy/alive.txt
>>   http-check disable-on-404
>>   server myhost myhost.example.com:8445 track example-http/myhost ssl 
>> ca-file /usr/local/ssl/certs/cacerts.cert
>> 
>> frontend localhost-api-frontend
>>   bind *:8445 ssl crt /usr/local/ssl/certs/example.com.pem
>>   mode http
>>   log global
>>   option forwardfor if-none
>>   option dontlog-normal
>> 
>>   # the alerts api backend
>>   acl alerts-api_host hdr_beg(Host) -i api.alerts
>>   use_backend localhost-api-backend-alerts if alerts-api_host
>> 
>>   default_backend bogus-http
>> 
>> backend localhost-api-backend-alerts
>>   mode http
>>   log global
>>   option forwardfor if-none
>>   option dontlog-normal
>>   server localhost 

Re: problem in 1.8 with hosts going out of service

2018-01-17 Thread Paul Lockaby
Ok I've tracked this problem down specifically to the usage of check tracking.

That is to say, the backend "example-api" is set to track the backend 
"example-http". When that tracking is enabled and one of the servers in the 
backend goes down then all of haproxy goes down and never recovers.

So this works:

server myhost myhost.example.com:8445 ssl ca-file 
/usr/local/ssl/certs/cacerts.cert


But this does not:

server myhost myhost.example.com:8445 track example-http/myhost ssl ca-file 
/usr/local/ssl/certs/cacerts.cert


This is definitely a regression from 1.7 because I used this feature in 1.7 
without issue.


> On Jan 16, 2018, at 10:36 PM, Paul Lockaby  wrote:
> 
> I'm experiencing a problem that I can't diagnose but I can recreate pretty 
> consistently. I have a single server that responds for example.com and 
> api.example.com and it runs haproxy. All the names run through an SSL front 
> door but an ACL makes it such that requests for example.com get sent to 8443 
> where Apache runs and requests for api.example.com get sent to 8445 where the 
> same instance of haproxy runs and does further examination of the request and 
> sends it to an application server running on localhost.
> 
> This configuration works great except when I take a server out of the 
> rotation by disabling it with disable-on-404. As soon as I take any server 
> out of the rotation, haproxy completely stops responding to ANY requests for 
> ANY backend even things that aren't part of the group such as the stats 
> backend and frontend. If I put the server back in to service haproxy does not 
> recover. I must restart haproxy on all hosts to recover. Nothing shows up in 
> the logs and I can't figure out how to debug it such that I can provide more 
> information but it's very consistently reproducible using the configuration 
> below. I am running 1.8.3 and I have not tried this on 1.7 or earlier 
> versions of 1.8.
> 
> Thanks for your help.
> -Paul
> 
> 
> 
> global
>log /dev/log local0
>user nobody
>group nobody
>tune.ssl.default-dh-param 2048
>stats socket /var/run/haproxy.sock user nobody group nobody
>daemon
> 
> defaults
>timeout connect 5000ms
>timeout client 60ms
>timeout server 60ms
> 
>option httplog
>option forwardfor
>option http-server-close
>option contstats
> 
> frontend stats-frontend
>bind *:2999
>mode http
>log global
>stats enable
>stats uri /haproxy
> 
> backend stats-backend
>mode http
>log global
>server stats /var/run/haproxy.sock check
> 
> frontend secured
># get the list of certificate options from a list in a file
>bind *:443 ssl crt-list /srv/haproxy/certificates.lst
>mode http
>log global
> 
># tell backend connections what our ssl client cn is
>http-request set-header X-SSL-Client-Verify %[ssl_c_verify]
>http-request set-header X-SSL-Client-DN %{+Q}[ssl_c_s_dn]
>http-request set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)]
>http-request set-header X-SSL-Issuer-DN %{+Q}[ssl_c_i_dn]
>http-request set-header X-SSL-Issuer-CN %{+Q}[ssl_c_i_dn(cn)]
> 
>acl server-status path_beg /server-
>use_backend bogus-http if server-status
> 
># connection requests for apis go to the api backends
>acl request_api hdr_beg(Host) -i api.
>use_backend example-api if request_api
> 
>default_backend example-http
> 
> backend example-http
>mode http
>log global
>balance source
>hash-type consistent
>option httpchk GET /haproxy/alive.txt
>http-check disable-on-404
>server myhost myhost.example.com:8443 check ssl ca-file 
> /usr/local/ssl/certs/cacerts.cert
> 
> backend bogus-http
>mode http
>errorfile 503 /netops/www/haproxy/403.http
> 
> backend example-api
>mode http
>log global
>balance roundrobin
>option httpchk GET /haproxy/alive.txt
>http-check disable-on-404
>server myhost myhost.example.com:8445 track example-http/myhost ssl 
> ca-file /usr/local/ssl/certs/cacerts.cert
> 
> frontend localhost-api-frontend
>bind *:8445 ssl crt /usr/local/ssl/certs/example.com.pem
>mode http
>log global
>option forwardfor if-none
>option dontlog-normal
> 
># the alerts api backend
>acl alerts-api_host hdr_beg(Host) -i api.alerts
>use_backend localhost-api-backend-alerts if alerts-api_host
> 
>default_backend bogus-http
> 
> backend localhost-api-backend-alerts
>mode http
>log global
>option forwardfor if-none
>option dontlog-normal
>server localhost localhost:4002
> 
> 
> 
> 
> 
> 
> 
> And the certificates.lst file referenced above looks like this:
> 
> 
> # this order is because we need to work with older clients that don't
> # speak sni and this works for them in our setup.
> /usr/local/ssl/certs/example.com.pem *
> /usr/local/ssl/certs/example.com.pem [ca-file 
> /usr/local/ssl/certs/example-ca.cert verify optional] 

problem in 1.8 with hosts going out of service

2018-01-16 Thread Paul Lockaby
I'm experiencing a problem that I can't diagnose but I can recreate pretty 
consistently. I have a single server that responds for example.com and 
api.example.com and it runs haproxy. All the names run through an SSL front 
door but an ACL makes it such that requests for example.com get sent to 8443 
where Apache runs and requests for api.example.com get sent to 8445 where the 
same instance of haproxy runs and does further examination of the request and 
sends it to an application server running on localhost.

This configuration works great except when I take a server out of the rotation 
by disabling it with disable-on-404. As soon as I take any server out of the 
rotation, haproxy completely stops responding to ANY requests for ANY backend 
even things that aren't part of the group such as the stats backend and 
frontend. If I put the server back in to service haproxy does not recover. I 
must restart haproxy on all hosts to recover. Nothing shows up in the logs and 
I can't figure out how to debug it such that I can provide more information but 
it's very consistently reproducible using the configuration below. I am running 
1.8.3 and I have not tried this on 1.7 or earlier versions of 1.8.

Thanks for your help.
-Paul



global
log /dev/log local0
user nobody
group nobody
tune.ssl.default-dh-param 2048
stats socket /var/run/haproxy.sock user nobody group nobody
daemon

defaults
timeout connect 5000ms
timeout client 60ms
timeout server 60ms

option httplog
option forwardfor
option http-server-close
option contstats

frontend stats-frontend
bind *:2999
mode http
log global
stats enable
stats uri /haproxy

backend stats-backend
mode http
log global
server stats /var/run/haproxy.sock check

frontend secured
# get the list of certificate options from a list in a file
bind *:443 ssl crt-list /srv/haproxy/certificates.lst
mode http
log global

# tell backend connections what our ssl client cn is
http-request set-header X-SSL-Client-Verify %[ssl_c_verify]
http-request set-header X-SSL-Client-DN %{+Q}[ssl_c_s_dn]
http-request set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)]
http-request set-header X-SSL-Issuer-DN %{+Q}[ssl_c_i_dn]
http-request set-header X-SSL-Issuer-CN %{+Q}[ssl_c_i_dn(cn)]

acl server-status path_beg /server-
use_backend bogus-http if server-status

# connection requests for apis go to the api backends
acl request_api hdr_beg(Host) -i api.
use_backend example-api if request_api

default_backend example-http

backend example-http
mode http
log global
balance source
hash-type consistent
option httpchk GET /haproxy/alive.txt
http-check disable-on-404
server myhost myhost.example.com:8443 check ssl ca-file 
/usr/local/ssl/certs/cacerts.cert

backend bogus-http
mode http
errorfile 503 /netops/www/haproxy/403.http

backend example-api
mode http
log global
balance roundrobin
option httpchk GET /haproxy/alive.txt
http-check disable-on-404
server myhost myhost.example.com:8445 track example-http/myhost ssl ca-file 
/usr/local/ssl/certs/cacerts.cert

frontend localhost-api-frontend
bind *:8445 ssl crt /usr/local/ssl/certs/example.com.pem
mode http
log global
option forwardfor if-none
option dontlog-normal

# the alerts api backend
acl alerts-api_host hdr_beg(Host) -i api.alerts
use_backend localhost-api-backend-alerts if alerts-api_host

default_backend bogus-http

backend localhost-api-backend-alerts
mode http
log global
option forwardfor if-none
option dontlog-normal
server localhost localhost:4002







And the certificates.lst file referenced above looks like this:


# this order is because we need to work with older clients that don't
# speak sni and this works for them in our setup.
/usr/local/ssl/certs/example.com.pem *
/usr/local/ssl/certs/example.com.pem [ca-file 
/usr/local/ssl/certs/example-ca.cert verify optional] api.example.com