Re: problem in 1.8 with hosts going out of service
On Thu, Jan 25, 2018 at 11:28:38AM +0100, Christopher Faulet wrote: > Thanks for tests guys. Willy, you can merge the patch. Applied, thanks! Willy
Re: problem in 1.8 with hosts going out of service
Le 24/01/2018 à 23:54, Paul Lockaby a écrit : This patch works for me. Thank you! On Jan 24, 2018, at 1:02 PM, Christopher Fauletwrote: Le 24/01/2018 à 17:21, Paul Lockaby a écrit : Sorry, I know this list is super busy and that there are a number of other more important issues that need to be worked through but I'm hoping one of the maintainers has been able to confirm this bug? Hi, Sorry Paul. As you said, we are pretty busy. And you're right to ping us. So, I can confirm the bug. It is a bug on threads, a deadlock, because of a typo. Could you check the attached patch to confirm it fixes your problem ? Hi, Thanks for tests guys. Willy, you can merge the patch. -- Christopher Faulet
Re: problem in 1.8 with hosts going out of service
This patch works for me. Thank you! > On Jan 24, 2018, at 1:02 PM, Christopher Fauletwrote: > > Le 24/01/2018 à 17:21, Paul Lockaby a écrit : >> Sorry, I know this list is super busy and that there are a number of other >> more important issues that need to be worked through but I'm hoping one of >> the maintainers has been able to confirm this bug? > > Hi, > > Sorry Paul. As you said, we are pretty busy. And you're right to ping us. So, > I can confirm the bug. It is a bug on threads, a deadlock, because of a typo. > > Could you check the attached patch to confirm it fixes your problem ? > > Thanks, > -- > Christopher Faulet > <0001-BUG-MEDIUM-threads-server-Fix-deadlock-in-srv_set_st.patch>
Re: problem in 1.8 with hosts going out of service
Hi Christopher, Patch seems to work for me. Maybe Paul can confirm as well. Regards, PiBa-NL / Pieter Op 24-1-2018 om 22:02 schreef Christopher Faulet: Le 24/01/2018 à 17:21, Paul Lockaby a écrit : Sorry, I know this list is super busy and that there are a number of other more important issues that need to be worked through but I'm hoping one of the maintainers has been able to confirm this bug? Hi, Sorry Paul. As you said, we are pretty busy. And you're right to ping us. So, I can confirm the bug. It is a bug on threads, a deadlock, because of a typo. Could you check the attached patch to confirm it fixes your problem ? Thanks,
Re: problem in 1.8 with hosts going out of service (alive.txt+404+track)
Hi Paul, List, Op 24-1-2018 om 17:21 schreef Paul Lockaby: Sorry, I know this list is super busy and that there are a number of other more important issues that need to be worked through but I'm hoping one of the maintainers has been able to confirm this bug? I can reproduce it indeed with your config on 1.8.3, cpu usage go's 100% and stats stops responding when the alive.txt is removed. When a backend go's down completely the track works as intended though (stats shows both server marked down). Hope this and below added info / smaller config helps someone track the issue down further in the code. Thanks, -Paul On Jan 17, 2018, at 10:27 AM, Paul Lockabywrote: Ok I've tracked this problem down specifically to the usage of check tracking. That is to say, the backend "example-api" is set to track the backend "example-http". When that tracking is enabled and one of the servers in the backend goes down then all of haproxy goes down and never recovers. So this works: server myhost myhost.example.com:8445 ssl ca-file /usr/local/ssl/certs/cacerts.cert But this does not: server myhost myhost.example.com:8445 track example-http/myhost ssl ca-file /usr/local/ssl/certs/cacerts.cert This is definitely a regression from 1.7 because I used this feature in 1.7 without issue. It seems a combination of 3 combined features to trigger this issue: option httpchk GET /alive.txt + http-check disable-on-404 + track Backtrace keeps showing this: (gdb) bt #0 0x0046b59f in srv_set_stopping () #1 0x004a3057 in ?? () #2 0x004f0eaf in process_runnable_tasks () #3 0x004aa13c in ?? () #4 0x004a9a16 in main () I could reduce the config to this: frontend stats-frontend bind *:2999 mode http log global stats enable stats uri /haproxy frontend stats-frontend bind *:2999 mode http log global stats enable stats uri /haproxy frontend secured bind *:8080 mode http acl request_api hdr_beg(Host) -i api. use_backend example-api if request_api default_backend example-http backend example-http mode http option httpchk GET /haproxy/alive.txt http-check disable-on-404 server myhost vhost1.pfs.local:302 check backend example-api mode http option httpchk GET /haproxy/alive.txt http-check disable-on-404 server myhost vhost1.pfs.local:303 track example-http/myhost Regards, PiBa-NL / Pieter
Re: problem in 1.8 with hosts going out of service
Le 24/01/2018 à 17:21, Paul Lockaby a écrit : Sorry, I know this list is super busy and that there are a number of other more important issues that need to be worked through but I'm hoping one of the maintainers has been able to confirm this bug? Hi, Sorry Paul. As you said, we are pretty busy. And you're right to ping us. So, I can confirm the bug. It is a bug on threads, a deadlock, because of a typo. Could you check the attached patch to confirm it fixes your problem ? Thanks, -- Christopher Faulet >From 6436e9d934045c7cd9c41bffa036eb6213ff21ee Mon Sep 17 00:00:00 2001 From: Christopher FauletDate: Wed, 24 Jan 2018 21:49:41 +0100 Subject: [PATCH] BUG/MEDIUM: threads/server: Fix deadlock in srv_set_stopping/srv_set_admin_flag Because of a typo (HA_SPIN_LOCK instead of HA_SPIN_UNLOCK), there is a deadlock in srv_set_stopping and srv_set_admin_flag when there is at least one trackers. This patch must be backported in 1.8. --- src/server.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/server.c b/src/server.c index 3901e7d8..07a6603a 100644 --- a/src/server.c +++ b/src/server.c @@ -976,7 +976,7 @@ void srv_set_stopping(struct server *s, const char *reason, struct check *check) for (srv = s->trackers; srv; srv = srv->tracknext) { HA_SPIN_LOCK(SERVER_LOCK, >lock); srv_set_stopping(srv, NULL, NULL); - HA_SPIN_LOCK(SERVER_LOCK, >lock); + HA_SPIN_UNLOCK(SERVER_LOCK, >lock); } } @@ -1019,7 +1019,7 @@ void srv_set_admin_flag(struct server *s, enum srv_admin mode, const char *cause for (srv = s->trackers; srv; srv = srv->tracknext) { HA_SPIN_LOCK(SERVER_LOCK, >lock); srv_set_admin_flag(srv, mode, cause); - HA_SPIN_LOCK(SERVER_LOCK, >lock); + HA_SPIN_UNLOCK(SERVER_LOCK, >lock); } } -- 2.14.3
Re: problem in 1.8 with hosts going out of service
Sorry, I know this list is super busy and that there are a number of other more important issues that need to be worked through but I'm hoping one of the maintainers has been able to confirm this bug? Thanks, -Paul > On Jan 17, 2018, at 10:27 AM, Paul Lockabywrote: > > Ok I've tracked this problem down specifically to the usage of check tracking. > > That is to say, the backend "example-api" is set to track the backend > "example-http". When that tracking is enabled and one of the servers in the > backend goes down then all of haproxy goes down and never recovers. > > So this works: > >server myhost myhost.example.com:8445 ssl ca-file > /usr/local/ssl/certs/cacerts.cert > > > But this does not: > >server myhost myhost.example.com:8445 track example-http/myhost ssl > ca-file /usr/local/ssl/certs/cacerts.cert > > > This is definitely a regression from 1.7 because I used this feature in 1.7 > without issue. > > >> On Jan 16, 2018, at 10:36 PM, Paul Lockaby wrote: >> >> I'm experiencing a problem that I can't diagnose but I can recreate pretty >> consistently. I have a single server that responds for example.com and >> api.example.com and it runs haproxy. All the names run through an SSL front >> door but an ACL makes it such that requests for example.com get sent to 8443 >> where Apache runs and requests for api.example.com get sent to 8445 where >> the same instance of haproxy runs and does further examination of the >> request and sends it to an application server running on localhost. >> >> This configuration works great except when I take a server out of the >> rotation by disabling it with disable-on-404. As soon as I take any server >> out of the rotation, haproxy completely stops responding to ANY requests for >> ANY backend even things that aren't part of the group such as the stats >> backend and frontend. If I put the server back in to service haproxy does >> not recover. I must restart haproxy on all hosts to recover. Nothing shows >> up in the logs and I can't figure out how to debug it such that I can >> provide more information but it's very consistently reproducible using the >> configuration below. I am running 1.8.3 and I have not tried this on 1.7 or >> earlier versions of 1.8. >> >> Thanks for your help. >> -Paul >> >> >> >> global >> log /dev/log local0 >> user nobody >> group nobody >> tune.ssl.default-dh-param 2048 >> stats socket /var/run/haproxy.sock user nobody group nobody >> daemon >> >> defaults >> timeout connect 5000ms >> timeout client 60ms >> timeout server 60ms >> >> option httplog >> option forwardfor >> option http-server-close >> option contstats >> >> frontend stats-frontend >> bind *:2999 >> mode http >> log global >> stats enable >> stats uri /haproxy >> >> backend stats-backend >> mode http >> log global >> server stats /var/run/haproxy.sock check >> >> frontend secured >> # get the list of certificate options from a list in a file >> bind *:443 ssl crt-list /srv/haproxy/certificates.lst >> mode http >> log global >> >> # tell backend connections what our ssl client cn is >> http-request set-header X-SSL-Client-Verify %[ssl_c_verify] >> http-request set-header X-SSL-Client-DN %{+Q}[ssl_c_s_dn] >> http-request set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)] >> http-request set-header X-SSL-Issuer-DN %{+Q}[ssl_c_i_dn] >> http-request set-header X-SSL-Issuer-CN %{+Q}[ssl_c_i_dn(cn)] >> >> acl server-status path_beg /server- >> use_backend bogus-http if server-status >> >> # connection requests for apis go to the api backends >> acl request_api hdr_beg(Host) -i api. >> use_backend example-api if request_api >> >> default_backend example-http >> >> backend example-http >> mode http >> log global >> balance source >> hash-type consistent >> option httpchk GET /haproxy/alive.txt >> http-check disable-on-404 >> server myhost myhost.example.com:8443 check ssl ca-file >> /usr/local/ssl/certs/cacerts.cert >> >> backend bogus-http >> mode http >> errorfile 503 /netops/www/haproxy/403.http >> >> backend example-api >> mode http >> log global >> balance roundrobin >> option httpchk GET /haproxy/alive.txt >> http-check disable-on-404 >> server myhost myhost.example.com:8445 track example-http/myhost ssl >> ca-file /usr/local/ssl/certs/cacerts.cert >> >> frontend localhost-api-frontend >> bind *:8445 ssl crt /usr/local/ssl/certs/example.com.pem >> mode http >> log global >> option forwardfor if-none >> option dontlog-normal >> >> # the alerts api backend >> acl alerts-api_host hdr_beg(Host) -i api.alerts >> use_backend localhost-api-backend-alerts if alerts-api_host >> >> default_backend bogus-http >> >> backend localhost-api-backend-alerts >> mode http >> log global >> option forwardfor if-none >> option dontlog-normal >> server localhost
Re: problem in 1.8 with hosts going out of service
Ok I've tracked this problem down specifically to the usage of check tracking. That is to say, the backend "example-api" is set to track the backend "example-http". When that tracking is enabled and one of the servers in the backend goes down then all of haproxy goes down and never recovers. So this works: server myhost myhost.example.com:8445 ssl ca-file /usr/local/ssl/certs/cacerts.cert But this does not: server myhost myhost.example.com:8445 track example-http/myhost ssl ca-file /usr/local/ssl/certs/cacerts.cert This is definitely a regression from 1.7 because I used this feature in 1.7 without issue. > On Jan 16, 2018, at 10:36 PM, Paul Lockabywrote: > > I'm experiencing a problem that I can't diagnose but I can recreate pretty > consistently. I have a single server that responds for example.com and > api.example.com and it runs haproxy. All the names run through an SSL front > door but an ACL makes it such that requests for example.com get sent to 8443 > where Apache runs and requests for api.example.com get sent to 8445 where the > same instance of haproxy runs and does further examination of the request and > sends it to an application server running on localhost. > > This configuration works great except when I take a server out of the > rotation by disabling it with disable-on-404. As soon as I take any server > out of the rotation, haproxy completely stops responding to ANY requests for > ANY backend even things that aren't part of the group such as the stats > backend and frontend. If I put the server back in to service haproxy does not > recover. I must restart haproxy on all hosts to recover. Nothing shows up in > the logs and I can't figure out how to debug it such that I can provide more > information but it's very consistently reproducible using the configuration > below. I am running 1.8.3 and I have not tried this on 1.7 or earlier > versions of 1.8. > > Thanks for your help. > -Paul > > > > global >log /dev/log local0 >user nobody >group nobody >tune.ssl.default-dh-param 2048 >stats socket /var/run/haproxy.sock user nobody group nobody >daemon > > defaults >timeout connect 5000ms >timeout client 60ms >timeout server 60ms > >option httplog >option forwardfor >option http-server-close >option contstats > > frontend stats-frontend >bind *:2999 >mode http >log global >stats enable >stats uri /haproxy > > backend stats-backend >mode http >log global >server stats /var/run/haproxy.sock check > > frontend secured ># get the list of certificate options from a list in a file >bind *:443 ssl crt-list /srv/haproxy/certificates.lst >mode http >log global > ># tell backend connections what our ssl client cn is >http-request set-header X-SSL-Client-Verify %[ssl_c_verify] >http-request set-header X-SSL-Client-DN %{+Q}[ssl_c_s_dn] >http-request set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)] >http-request set-header X-SSL-Issuer-DN %{+Q}[ssl_c_i_dn] >http-request set-header X-SSL-Issuer-CN %{+Q}[ssl_c_i_dn(cn)] > >acl server-status path_beg /server- >use_backend bogus-http if server-status > ># connection requests for apis go to the api backends >acl request_api hdr_beg(Host) -i api. >use_backend example-api if request_api > >default_backend example-http > > backend example-http >mode http >log global >balance source >hash-type consistent >option httpchk GET /haproxy/alive.txt >http-check disable-on-404 >server myhost myhost.example.com:8443 check ssl ca-file > /usr/local/ssl/certs/cacerts.cert > > backend bogus-http >mode http >errorfile 503 /netops/www/haproxy/403.http > > backend example-api >mode http >log global >balance roundrobin >option httpchk GET /haproxy/alive.txt >http-check disable-on-404 >server myhost myhost.example.com:8445 track example-http/myhost ssl > ca-file /usr/local/ssl/certs/cacerts.cert > > frontend localhost-api-frontend >bind *:8445 ssl crt /usr/local/ssl/certs/example.com.pem >mode http >log global >option forwardfor if-none >option dontlog-normal > ># the alerts api backend >acl alerts-api_host hdr_beg(Host) -i api.alerts >use_backend localhost-api-backend-alerts if alerts-api_host > >default_backend bogus-http > > backend localhost-api-backend-alerts >mode http >log global >option forwardfor if-none >option dontlog-normal >server localhost localhost:4002 > > > > > > > > And the certificates.lst file referenced above looks like this: > > > # this order is because we need to work with older clients that don't > # speak sni and this works for them in our setup. > /usr/local/ssl/certs/example.com.pem * > /usr/local/ssl/certs/example.com.pem [ca-file > /usr/local/ssl/certs/example-ca.cert verify optional]
problem in 1.8 with hosts going out of service
I'm experiencing a problem that I can't diagnose but I can recreate pretty consistently. I have a single server that responds for example.com and api.example.com and it runs haproxy. All the names run through an SSL front door but an ACL makes it such that requests for example.com get sent to 8443 where Apache runs and requests for api.example.com get sent to 8445 where the same instance of haproxy runs and does further examination of the request and sends it to an application server running on localhost. This configuration works great except when I take a server out of the rotation by disabling it with disable-on-404. As soon as I take any server out of the rotation, haproxy completely stops responding to ANY requests for ANY backend even things that aren't part of the group such as the stats backend and frontend. If I put the server back in to service haproxy does not recover. I must restart haproxy on all hosts to recover. Nothing shows up in the logs and I can't figure out how to debug it such that I can provide more information but it's very consistently reproducible using the configuration below. I am running 1.8.3 and I have not tried this on 1.7 or earlier versions of 1.8. Thanks for your help. -Paul global log /dev/log local0 user nobody group nobody tune.ssl.default-dh-param 2048 stats socket /var/run/haproxy.sock user nobody group nobody daemon defaults timeout connect 5000ms timeout client 60ms timeout server 60ms option httplog option forwardfor option http-server-close option contstats frontend stats-frontend bind *:2999 mode http log global stats enable stats uri /haproxy backend stats-backend mode http log global server stats /var/run/haproxy.sock check frontend secured # get the list of certificate options from a list in a file bind *:443 ssl crt-list /srv/haproxy/certificates.lst mode http log global # tell backend connections what our ssl client cn is http-request set-header X-SSL-Client-Verify %[ssl_c_verify] http-request set-header X-SSL-Client-DN %{+Q}[ssl_c_s_dn] http-request set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)] http-request set-header X-SSL-Issuer-DN %{+Q}[ssl_c_i_dn] http-request set-header X-SSL-Issuer-CN %{+Q}[ssl_c_i_dn(cn)] acl server-status path_beg /server- use_backend bogus-http if server-status # connection requests for apis go to the api backends acl request_api hdr_beg(Host) -i api. use_backend example-api if request_api default_backend example-http backend example-http mode http log global balance source hash-type consistent option httpchk GET /haproxy/alive.txt http-check disable-on-404 server myhost myhost.example.com:8443 check ssl ca-file /usr/local/ssl/certs/cacerts.cert backend bogus-http mode http errorfile 503 /netops/www/haproxy/403.http backend example-api mode http log global balance roundrobin option httpchk GET /haproxy/alive.txt http-check disable-on-404 server myhost myhost.example.com:8445 track example-http/myhost ssl ca-file /usr/local/ssl/certs/cacerts.cert frontend localhost-api-frontend bind *:8445 ssl crt /usr/local/ssl/certs/example.com.pem mode http log global option forwardfor if-none option dontlog-normal # the alerts api backend acl alerts-api_host hdr_beg(Host) -i api.alerts use_backend localhost-api-backend-alerts if alerts-api_host default_backend bogus-http backend localhost-api-backend-alerts mode http log global option forwardfor if-none option dontlog-normal server localhost localhost:4002 And the certificates.lst file referenced above looks like this: # this order is because we need to work with older clients that don't # speak sni and this works for them in our setup. /usr/local/ssl/certs/example.com.pem * /usr/local/ssl/certs/example.com.pem [ca-file /usr/local/ssl/certs/example-ca.cert verify optional] api.example.com