Re: Hang in haproxy 1.8.13
Awesome, thanks! On Tue, 11 Sep 2018 at 14:19, Willy Tarreau wrote: > On Tue, Sep 11, 2018 at 02:10:15PM +0100, David King wrote: > > Fantastic, thanks everyone > > > > i guess i would need to wait to 1.8.14 for seeing it on stable? > > In fact we'll backport it ASAP along with a few possibly other pending > patches. We take care of maintaining the ordering between the patches when > doing the backports, which is why sometimes we seem to "sleep" over a few > of them. However you can safely apply Olivier's patch directly on your > 1.8 tree. > > Willy >
Re: Hang in haproxy 1.8.13
On Tue, Sep 11, 2018 at 02:10:15PM +0100, David King wrote: > Fantastic, thanks everyone > > i guess i would need to wait to 1.8.14 for seeing it on stable? In fact we'll backport it ASAP along with a few possibly other pending patches. We take care of maintaining the ordering between the patches when doing the backports, which is why sometimes we seem to "sleep" over a few of them. However you can safely apply Olivier's patch directly on your 1.8 tree. Willy
Re: Hang in haproxy 1.8.13
Fantastic, thanks everyone i guess i would need to wait to 1.8.14 for seeing it on stable? On Tue, 11 Sep 2018 at 13:53, Willy Tarreau wrote: > Hi Olivier, > > On Tue, Sep 11, 2018 at 02:51:57PM +0200, Olivier Houchard wrote: > > Ok I think I figured it out. > > The bug was present in master too, but was masked for some reason. > > > > The patches attached should fix this. The first one is for master, and > the > > second one for 1.8, as the master patch didn't apply cleanly on 1.8. > > Now applied to master, thank you! > Willy >
Re: Hang in haproxy 1.8.13
Hi Olivier, On Tue, Sep 11, 2018 at 02:51:57PM +0200, Olivier Houchard wrote: > Ok I think I figured it out. > The bug was present in master too, but was masked for some reason. > > The patches attached should fix this. The first one is for master, and the > second one for 1.8, as the master patch didn't apply cleanly on 1.8. Now applied to master, thank you! Willy
Re: Hang in haproxy 1.8.13
Hi, On Tue, Sep 11, 2018 at 12:58:40PM +0100, David King wrote: > Hi, > > > I just tested, and it seems to happen with the latest 1.8, but not with > 1.9. > > Not using kqueue (by using -dk) seems to work around the issue. > > It's quite interesting it doesn't happen on Centos, it means it is > probably > > kqueue-specific (or that it works by accident with epoll/poll/select). > > I'm investigating. > > > > Regards, > > > > Olivier > > > so i've been running some tests from builds from source, it seems to be > 1.8.13 specific, as 1.8.12 works fine, and as Oliver said 1.9 is OK as well > > Thanks > > Dave Ok I think I figured it out. The bug was present in master too, but was masked for some reason. The patches attached should fix this. The first one is for master, and the second one for 1.8, as the master patch didn't apply cleanly on 1.8. Regards, Olivier >From d950da31340528c37173fc74d1c0f635c977cd03 Mon Sep 17 00:00:00 2001 From: Olivier Houchard Date: Tue, 11 Sep 2018 14:44:51 +0200 Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by accident. In _update_fd(), if the fd wasn't polled, and we don't want it to be polled, we just returned 0, however, we should return changes instead, or all previous changes will be lost. This should be backported to 1.8. --- src/ev_kqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c index 087a07e7..e2f04f70 100644 --- a/src/ev_kqueue.c +++ b/src/ev_kqueue.c @@ -44,7 +44,7 @@ static int _update_fd(int fd, int start) if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) { if (!(polled_mask[fd] & tid_bit)) { /* fd was not watched, it's still not */ - return 0; + return changes; } /* fd totally removed from poll list */ EV_SET([changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL); -- 2.14.3 >From e987a5bda41a1b560a352b4ec2d54a8ebcd5965a Mon Sep 17 00:00:00 2001 From: Olivier Houchard Date: Tue, 11 Sep 2018 14:44:51 +0200 Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by accident. In _update_fd(), if the fd wasn't polled, and we don't want it to be polled, we just returned 0, however, we should return changes instead, or all previous changes will be lost. This should be backported to 1.8. --- src/ev_kqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c index 1f4762e6..c88cf6f3 100644 --- a/src/ev_kqueue.c +++ b/src/ev_kqueue.c @@ -44,7 +44,7 @@ static int _update_fd(int fd, int start) if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) { if (!(fdtab[fd].polled_mask & tid_bit)) { /* fd was not watched, it's still not */ - return 0; + return changes; } /* fd totally removed from poll list */ EV_SET([changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL); -- 2.14.3
Re: Hang in haproxy 1.8.13
Hi, > I just tested, and it seems to happen with the latest 1.8, but not with 1.9. > Not using kqueue (by using -dk) seems to work around the issue. > It's quite interesting it doesn't happen on Centos, it means it is probably > kqueue-specific (or that it works by accident with epoll/poll/select). > I'm investigating. > > Regards, > > Olivier so i've been running some tests from builds from source, it seems to be 1.8.13 specific, as 1.8.12 works fine, and as Oliver said 1.9 is OK as well Thanks Dave On Tue, 11 Sep 2018 at 12:29, Olivier Houchard wrote: > Hi, > > On Tue, Sep 11, 2018 at 12:36:08PM +0200, Lukas Tribus wrote: > > On Tue, 11 Sep 2018 at 11:55, David King > wrote: > > > > > > Apologies, i forgot to mention this is running on FreeBSD 11.1 > > > > > > I've just run the same tests on Centos and there is no issue > > > > Could you retry with the current development tree (1.9) from git? > > There are a number of fixes waiting to be backported to 1.8 and also a > > number of already backported fixes (but post 1.8.13). > > > > > > I just tested, and it seems to happen with the latest 1.8, but not with > 1.9. > Not using kqueue (by using -dk) seems to work around the issue. > It's quite interesting it doesn't happen on Centos, it means it is probably > kqueue-specific (or that it works by accident with epoll/poll/select). > I'm investigating. > > Regards, > > Olivier > > >
Re: Hang in haproxy 1.8.13
Hi, On Tue, Sep 11, 2018 at 12:36:08PM +0200, Lukas Tribus wrote: > On Tue, 11 Sep 2018 at 11:55, David King wrote: > > > > Apologies, i forgot to mention this is running on FreeBSD 11.1 > > > > I've just run the same tests on Centos and there is no issue > > Could you retry with the current development tree (1.9) from git? > There are a number of fixes waiting to be backported to 1.8 and also a > number of already backported fixes (but post 1.8.13). > > I just tested, and it seems to happen with the latest 1.8, but not with 1.9. Not using kqueue (by using -dk) seems to work around the issue. It's quite interesting it doesn't happen on Centos, it means it is probably kqueue-specific (or that it works by accident with epoll/poll/select). I'm investigating. Regards, Olivier
Re: Hang in haproxy 1.8.13
On Tue, 11 Sep 2018 at 11:55, David King wrote: > > Apologies, i forgot to mention this is running on FreeBSD 11.1 > > I've just run the same tests on Centos and there is no issue Could you retry with the current development tree (1.9) from git? There are a number of fixes waiting to be backported to 1.8 and also a number of already backported fixes (but post 1.8.13). Lukas
Re: Hang in haproxy 1.8.13
Apologies, i forgot to mention this is running on FreeBSD 11.1 I've just run the same tests on Centos and there is no issue On Tue, 11 Sep 2018 at 09:05, David King wrote: > Hi all > > i was hoping for some help with see if this is a bug or a mis config > > i have a config which works fine in 1.7.9 and 1.7.11, but when running it > 1.8.13 is seems to fail to respond to the client > > I've simplified down and anonymised the config, but basically it's being > used to do A/B failiover at haproxy so it uses nested frontend and backends > with unix sockets to achieve this > > The config is > > -- > global > pidfile /var/run/haproxy.pid > stats socket /var/run/haproxy.sock mode 600 level admin > stats timeout 2m > maxconn 4000 > unix-bind user root mode 666 > daemon > > defaults > timeout connect 15s > timeout server 1m > timeout http-request15s > timeout check 15s > > frontend frontend1 > bind 127.0.0.1:8080 > mode http > timeout client 1m > default_backend api > > frontend api-A > bind /var/run/haproxy/api-A.sock > mode http > default_backend api-A > > frontend api-B > bind /var/run/haproxy/api-B.sock > mode http > default_backend api-B > > backend api > mode http > balance roundrobin > server api-A /var/run/haproxy/api-A.sock disabled > server api-B /var/run/haproxy/api-B.sock > > backend api-A > mode http > balance roundrobin > option forwardfor > server server-1-A > server server-2-B > > backend api-B > mode http > balance roundrobin > option forwardfor > server server-1-B > server server-2-B > - > > When running with 1.7.11, its all nice and quick > > curl http://10.2.74.41:8443 -vvv > * Rebuilt URL to: http://10.2.74.41:8443/ > * Trying 10.2.74.41... > * TCP_NODELAY set > * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0) > > GET / HTTP/1.1 > > Host: 10.2.74.41:8443 > > User-Agent: curl/7.61.0 > > Accept: */* > > > < HTTP/1.1 404 Not Found > < Content-Type: application/problem+json; charset=utf-8 > < Date: Tue, 11 Sep 2018 07:50:54 GMT > < Content-Length: 27 > < > * Connection #0 to host 10.2.74.41 left intact > > 0004:frontend1.accept(0005)=0008 from [10.2.74.41:44161] > 0004:frontend1.clireq[0008:]: GET / HTTP/1.1 > 0004:frontend1.clihdr[0008:]: Host: 10.2.74.41:8443 > 0004:frontend1.clihdr[0008:]: User-Agent: curl/7.61.0 > 0004:frontend1.clihdr[0008:]: Accept: */* > 0005:api-B.accept(0007)=000a from [unix:1] > 0005:api-B.clireq[000a:]: GET / HTTP/1.1 > 0005:api-B.clihdr[000a:]: Host: 10.2.74.41:8443 > 0005:api-B.clihdr[000a:]: User-Agent: curl/7.61.0 > 0005:api-B.clihdr[000a:]: Accept: */* > 0005:api-B.srvrep[000a:000b]: HTTP/1.1 404 Not Found > 0005:api-B.srvhdr[000a:000b]: Content-Type: application/problem+json; > charset=utf-8 > 0005:api-B.srvhdr[000a:000b]: Date: Tue, 11 Sep 2018 07:51:22 GMT > 0005:api-B.srvhdr[000a:000b]: Content-Length: 27 > 0004:api.srvrep[0008:0009]: HTTP/1.1 404 Not Found > 0004:api.srvhdr[0008:0009]: Content-Type: application/problem+json; > charset=utf-8 > 0004:api.srvhdr[0008:0009]: Date: Tue, 11 Sep 2018 07:51:22 GMT > 0004:api.srvhdr[0008:0009]: Content-Length: 27 > 0007:frontend1.clicls[0008:0009] > 0007:frontend1.closed[0008:0009] > 0006:api-B.clicls[000a:000b] > 0006:api-B.closed[000a:000b] > > > however with 1.8.13 the client doesn't gets the response and ends up with > a 504, however the haproxy debug log shows the response is reviced by > haproxy, but never passed to the client > > curl http://10.2.74.41:8443 -vvv > * Rebuilt URL to: http://10.2.74.41:8443/ > * Trying 10.2.74.41... > * TCP_NODELAY set > * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0) > > GET / HTTP/1.1 > > Host: 10.2.74.41:8443 > > User-Agent: curl/7.61.0 > > Accept: */* > > > *** HANGS HERE TILL 504 > > from haproxy > :frontend1.accept(0006)=000b from [10.2.74.41:57043] ALPN= > :frontend1.clireq[000b:]: GET / HTTP/1.1 > :frontend1.clihdr[000b:]: Host: 10.2.74.41:8443 > :frontend1.clihdr[000b:]: User-Agent: curl/7.61.0 > :frontend1.clihdr[000b:]: Accept: */* > 0001:api-A.accept(0007)=000d from [unix:1] ALPN= > 0001:api-A.clireq[000d:]: GET / HTTP/1.1 > 0001:api-A.clihdr[000d:]: Host: 10.2.74.41:8443 > 0001:api-A.clihdr[000d:]: User-Agent: curl/7.61.0 > 0001:api-A.clihdr[000d:]: Accept: */* > 0001:api-A.srvrep[000d:000e]: HTTP/1.1 404 Not Found > 0001:api-A.srvhdr[000d:000e]: Content-Type: application/problem+json; > charset=utf-8 > 0001:api-A.srvhdr[000d:000e]: Date: Tue, 11 Sep 2018 07:48:39 GMT > 0001:api-A.srvhdr[000d:000e]: Content-Length: 27 > ** 5 seconds >
Hang in haproxy 1.8.13
Hi all i was hoping for some help with see if this is a bug or a mis config i have a config which works fine in 1.7.9 and 1.7.11, but when running it 1.8.13 is seems to fail to respond to the client I've simplified down and anonymised the config, but basically it's being used to do A/B failiover at haproxy so it uses nested frontend and backends with unix sockets to achieve this The config is -- global pidfile /var/run/haproxy.pid stats socket /var/run/haproxy.sock mode 600 level admin stats timeout 2m maxconn 4000 unix-bind user root mode 666 daemon defaults timeout connect 15s timeout server 1m timeout http-request15s timeout check 15s frontend frontend1 bind 127.0.0.1:8080 mode http timeout client 1m default_backend api frontend api-A bind /var/run/haproxy/api-A.sock mode http default_backend api-A frontend api-B bind /var/run/haproxy/api-B.sock mode http default_backend api-B backend api mode http balance roundrobin server api-A /var/run/haproxy/api-A.sock disabled server api-B /var/run/haproxy/api-B.sock backend api-A mode http balance roundrobin option forwardfor server server-1-A server server-2-B backend api-B mode http balance roundrobin option forwardfor server server-1-B server server-2-B - When running with 1.7.11, its all nice and quick curl http://10.2.74.41:8443 -vvv * Rebuilt URL to: http://10.2.74.41:8443/ * Trying 10.2.74.41... * TCP_NODELAY set * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0) > GET / HTTP/1.1 > Host: 10.2.74.41:8443 > User-Agent: curl/7.61.0 > Accept: */* > < HTTP/1.1 404 Not Found < Content-Type: application/problem+json; charset=utf-8 < Date: Tue, 11 Sep 2018 07:50:54 GMT < Content-Length: 27 < * Connection #0 to host 10.2.74.41 left intact 0004:frontend1.accept(0005)=0008 from [10.2.74.41:44161] 0004:frontend1.clireq[0008:]: GET / HTTP/1.1 0004:frontend1.clihdr[0008:]: Host: 10.2.74.41:8443 0004:frontend1.clihdr[0008:]: User-Agent: curl/7.61.0 0004:frontend1.clihdr[0008:]: Accept: */* 0005:api-B.accept(0007)=000a from [unix:1] 0005:api-B.clireq[000a:]: GET / HTTP/1.1 0005:api-B.clihdr[000a:]: Host: 10.2.74.41:8443 0005:api-B.clihdr[000a:]: User-Agent: curl/7.61.0 0005:api-B.clihdr[000a:]: Accept: */* 0005:api-B.srvrep[000a:000b]: HTTP/1.1 404 Not Found 0005:api-B.srvhdr[000a:000b]: Content-Type: application/problem+json; charset=utf-8 0005:api-B.srvhdr[000a:000b]: Date: Tue, 11 Sep 2018 07:51:22 GMT 0005:api-B.srvhdr[000a:000b]: Content-Length: 27 0004:api.srvrep[0008:0009]: HTTP/1.1 404 Not Found 0004:api.srvhdr[0008:0009]: Content-Type: application/problem+json; charset=utf-8 0004:api.srvhdr[0008:0009]: Date: Tue, 11 Sep 2018 07:51:22 GMT 0004:api.srvhdr[0008:0009]: Content-Length: 27 0007:frontend1.clicls[0008:0009] 0007:frontend1.closed[0008:0009] 0006:api-B.clicls[000a:000b] 0006:api-B.closed[000a:000b] however with 1.8.13 the client doesn't gets the response and ends up with a 504, however the haproxy debug log shows the response is reviced by haproxy, but never passed to the client curl http://10.2.74.41:8443 -vvv * Rebuilt URL to: http://10.2.74.41:8443/ * Trying 10.2.74.41... * TCP_NODELAY set * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0) > GET / HTTP/1.1 > Host: 10.2.74.41:8443 > User-Agent: curl/7.61.0 > Accept: */* > *** HANGS HERE TILL 504 from haproxy :frontend1.accept(0006)=000b from [10.2.74.41:57043] ALPN= :frontend1.clireq[000b:]: GET / HTTP/1.1 :frontend1.clihdr[000b:]: Host: 10.2.74.41:8443 :frontend1.clihdr[000b:]: User-Agent: curl/7.61.0 :frontend1.clihdr[000b:]: Accept: */* 0001:api-A.accept(0007)=000d from [unix:1] ALPN= 0001:api-A.clireq[000d:]: GET / HTTP/1.1 0001:api-A.clihdr[000d:]: Host: 10.2.74.41:8443 0001:api-A.clihdr[000d:]: User-Agent: curl/7.61.0 0001:api-A.clihdr[000d:]: Accept: */* 0001:api-A.srvrep[000d:000e]: HTTP/1.1 404 Not Found 0001:api-A.srvhdr[000d:000e]: Content-Type: application/problem+json; charset=utf-8 0001:api-A.srvhdr[000d:000e]: Date: Tue, 11 Sep 2018 07:48:39 GMT 0001:api-A.srvhdr[000d:000e]: Content-Length: 27 ** 5 seconds 0002:api-A.clicls[adfd:] 0002:api-A.closed[adfd:] ** 15 seconds :api.srvcls[000b:adfd] :api.clicls[adfd:adfd] :api.closed[adfd:adfd] any idea why this happens? sorry for long post! Dave