Re: Hang in haproxy 1.8.13

2018-09-11 Thread David King
Awesome, thanks!

On Tue, 11 Sep 2018 at 14:19, Willy Tarreau  wrote:

> On Tue, Sep 11, 2018 at 02:10:15PM +0100, David King wrote:
> > Fantastic, thanks everyone
> >
> > i guess i would need to wait to 1.8.14 for seeing it on stable?
>
> In fact we'll backport it ASAP along with a few possibly other pending
> patches. We take care of maintaining the ordering between the patches when
> doing the backports, which is why sometimes we seem to "sleep" over a few
> of them. However you can safely apply Olivier's patch directly on your
> 1.8 tree.
>
> Willy
>


Re: Hang in haproxy 1.8.13

2018-09-11 Thread Willy Tarreau
On Tue, Sep 11, 2018 at 02:10:15PM +0100, David King wrote:
> Fantastic, thanks everyone
> 
> i guess i would need to wait to 1.8.14 for seeing it on stable?

In fact we'll backport it ASAP along with a few possibly other pending
patches. We take care of maintaining the ordering between the patches when
doing the backports, which is why sometimes we seem to "sleep" over a few
of them. However you can safely apply Olivier's patch directly on your
1.8 tree.

Willy



Re: Hang in haproxy 1.8.13

2018-09-11 Thread David King
Fantastic, thanks everyone

i guess i would need to wait to 1.8.14 for seeing it on stable?

On Tue, 11 Sep 2018 at 13:53, Willy Tarreau  wrote:

> Hi Olivier,
>
> On Tue, Sep 11, 2018 at 02:51:57PM +0200, Olivier Houchard wrote:
> > Ok I think I figured it out.
> > The bug was present in master too, but was masked for some reason.
> >
> > The patches attached should fix this. The first one is for master, and
> the
> > second one for 1.8, as the master patch didn't apply cleanly on 1.8.
>
> Now applied to master, thank you!
> Willy
>


Re: Hang in haproxy 1.8.13

2018-09-11 Thread Willy Tarreau
Hi Olivier,

On Tue, Sep 11, 2018 at 02:51:57PM +0200, Olivier Houchard wrote:
> Ok I think I figured it out.
> The bug was present in master too, but was masked for some reason.
> 
> The patches attached should fix this. The first one is for master, and the
> second one for 1.8, as the master patch didn't apply cleanly on 1.8.

Now applied to master, thank you!
Willy



Re: Hang in haproxy 1.8.13

2018-09-11 Thread Olivier Houchard
Hi,

On Tue, Sep 11, 2018 at 12:58:40PM +0100, David King wrote:
> Hi,
> 
> > I just tested, and it seems to happen with the latest 1.8, but not with
> 1.9.
> > Not using kqueue (by using -dk) seems to work around the issue.
> > It's quite interesting it doesn't happen on Centos, it means it is
> probably
> > kqueue-specific (or that it works by accident with epoll/poll/select).
> > I'm investigating.
> >
> > Regards,
> >
> > Olivier
> 
> 
> so i've been running some tests from builds from source, it seems to be
> 1.8.13 specific, as 1.8.12 works fine, and as Oliver said 1.9 is OK as well
> 
> Thanks
> 
> Dave

Ok I think I figured it out.
The bug was present in master too, but was masked for some reason.

The patches attached should fix this. The first one is for master, and the
second one for 1.8, as the master patch didn't apply cleanly on 1.8.

Regards,

Olivier
>From d950da31340528c37173fc74d1c0f635c977cd03 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 11 Sep 2018 14:44:51 +0200
Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by
 accident.

In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.

This should be backported to 1.8.
---
 src/ev_kqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 087a07e7..e2f04f70 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -44,7 +44,7 @@ static int _update_fd(int fd, int start)
if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) {
if (!(polled_mask[fd] & tid_bit)) {
/* fd was not watched, it's still not */
-   return 0;
+   return changes;
}
/* fd totally removed from poll list */
EV_SET([changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL);
-- 
2.14.3

>From e987a5bda41a1b560a352b4ec2d54a8ebcd5965a Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 11 Sep 2018 14:44:51 +0200
Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by
 accident.

In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.

This should be backported to 1.8.
---
 src/ev_kqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 1f4762e6..c88cf6f3 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -44,7 +44,7 @@ static int _update_fd(int fd, int start)
if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) {
if (!(fdtab[fd].polled_mask & tid_bit)) {
/* fd was not watched, it's still not */
-   return 0;
+   return changes;
}
/* fd totally removed from poll list */
EV_SET([changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL);
-- 
2.14.3



Re: Hang in haproxy 1.8.13

2018-09-11 Thread David King
Hi,

> I just tested, and it seems to happen with the latest 1.8, but not with
1.9.
> Not using kqueue (by using -dk) seems to work around the issue.
> It's quite interesting it doesn't happen on Centos, it means it is
probably
> kqueue-specific (or that it works by accident with epoll/poll/select).
> I'm investigating.
>
> Regards,
>
> Olivier


so i've been running some tests from builds from source, it seems to be
1.8.13 specific, as 1.8.12 works fine, and as Oliver said 1.9 is OK as well

Thanks

Dave

On Tue, 11 Sep 2018 at 12:29, Olivier Houchard 
wrote:

> Hi,
>
> On Tue, Sep 11, 2018 at 12:36:08PM +0200, Lukas Tribus wrote:
> > On Tue, 11 Sep 2018 at 11:55, David King 
> wrote:
> > >
> > > Apologies, i forgot to mention this is running on FreeBSD 11.1
> > >
> > > I've just run the same tests on Centos and there is no issue
> >
> > Could you retry with the current development tree (1.9) from git?
> > There are a number of fixes waiting to be backported to 1.8 and also a
> > number of already backported fixes (but post 1.8.13).
> >
> >
>
> I just tested, and it seems to happen with the latest 1.8, but not with
> 1.9.
> Not using kqueue (by using -dk) seems to work around the issue.
> It's quite interesting it doesn't happen on Centos, it means it is probably
> kqueue-specific (or that it works by accident with epoll/poll/select).
> I'm investigating.
>
> Regards,
>
> Olivier
>
>
>


Re: Hang in haproxy 1.8.13

2018-09-11 Thread Olivier Houchard
Hi,

On Tue, Sep 11, 2018 at 12:36:08PM +0200, Lukas Tribus wrote:
> On Tue, 11 Sep 2018 at 11:55, David King  wrote:
> >
> > Apologies, i forgot to mention this is running on FreeBSD 11.1
> >
> > I've just run the same tests on Centos and there is no issue
> 
> Could you retry with the current development tree (1.9) from git?
> There are a number of fixes waiting to be backported to 1.8 and also a
> number of already backported fixes (but post 1.8.13).
> 
> 

I just tested, and it seems to happen with the latest 1.8, but not with 1.9.
Not using kqueue (by using -dk) seems to work around the issue.
It's quite interesting it doesn't happen on Centos, it means it is probably
kqueue-specific (or that it works by accident with epoll/poll/select).
I'm investigating.

Regards,

Olivier





Re: Hang in haproxy 1.8.13

2018-09-11 Thread Lukas Tribus
On Tue, 11 Sep 2018 at 11:55, David King  wrote:
>
> Apologies, i forgot to mention this is running on FreeBSD 11.1
>
> I've just run the same tests on Centos and there is no issue

Could you retry with the current development tree (1.9) from git?
There are a number of fixes waiting to be backported to 1.8 and also a
number of already backported fixes (but post 1.8.13).


Lukas



Re: Hang in haproxy 1.8.13

2018-09-11 Thread David King
Apologies, i forgot to mention this is running on FreeBSD 11.1

I've just run the same tests on Centos and there is no issue

On Tue, 11 Sep 2018 at 09:05, David King 
wrote:

> Hi all
>
> i was hoping for some help with see if this is a bug or a mis config
>
> i have a config which works fine in 1.7.9 and 1.7.11, but when running it
> 1.8.13 is seems to fail to respond to the client
>
> I've simplified down and anonymised the config, but basically it's being
> used to do A/B failiover at haproxy so it uses nested frontend and backends
> with unix sockets to achieve this
>
> The config is
>
> --
> global
> pidfile /var/run/haproxy.pid
> stats socket /var/run/haproxy.sock mode 600 level admin
> stats timeout 2m
> maxconn 4000
> unix-bind user root mode 666
> daemon
>
> defaults
> timeout connect 15s
> timeout server  1m
> timeout http-request15s
> timeout check   15s
>
> frontend frontend1
> bind 127.0.0.1:8080
> mode http
> timeout client  1m
> default_backend api
>
> frontend api-A
> bind /var/run/haproxy/api-A.sock
> mode http
> default_backend api-A
>
> frontend api-B
> bind /var/run/haproxy/api-B.sock
> mode http
> default_backend api-B
>
> backend api
> mode http
> balance roundrobin
> server api-A /var/run/haproxy/api-A.sock disabled
> server api-B /var/run/haproxy/api-B.sock
>
> backend api-A
> mode http
> balance roundrobin
> option forwardfor
> server server-1-A 
> server server-2-B 
>
> backend api-B
> mode http
> balance roundrobin
> option forwardfor
> server server-1-B 
> server server-2-B 
> -
>
> When running with 1.7.11, its all nice and quick
>
> curl http://10.2.74.41:8443 -vvv
> * Rebuilt URL to: http://10.2.74.41:8443/
> *   Trying 10.2.74.41...
> * TCP_NODELAY set
> * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> > GET / HTTP/1.1
> > Host: 10.2.74.41:8443
> > User-Agent: curl/7.61.0
> > Accept: */*
> >
> < HTTP/1.1 404 Not Found
> < Content-Type: application/problem+json; charset=utf-8
> < Date: Tue, 11 Sep 2018 07:50:54 GMT
> < Content-Length: 27
> <
> * Connection #0 to host 10.2.74.41 left intact
>
> 0004:frontend1.accept(0005)=0008 from [10.2.74.41:44161]
> 0004:frontend1.clireq[0008:]: GET / HTTP/1.1
> 0004:frontend1.clihdr[0008:]: Host: 10.2.74.41:8443
> 0004:frontend1.clihdr[0008:]: User-Agent: curl/7.61.0
> 0004:frontend1.clihdr[0008:]: Accept: */*
> 0005:api-B.accept(0007)=000a from [unix:1]
> 0005:api-B.clireq[000a:]: GET / HTTP/1.1
> 0005:api-B.clihdr[000a:]: Host: 10.2.74.41:8443
> 0005:api-B.clihdr[000a:]: User-Agent: curl/7.61.0
> 0005:api-B.clihdr[000a:]: Accept: */*
> 0005:api-B.srvrep[000a:000b]: HTTP/1.1 404 Not Found
> 0005:api-B.srvhdr[000a:000b]: Content-Type: application/problem+json;
> charset=utf-8
> 0005:api-B.srvhdr[000a:000b]: Date: Tue, 11 Sep 2018 07:51:22 GMT
> 0005:api-B.srvhdr[000a:000b]: Content-Length: 27
> 0004:api.srvrep[0008:0009]: HTTP/1.1 404 Not Found
> 0004:api.srvhdr[0008:0009]: Content-Type: application/problem+json;
> charset=utf-8
> 0004:api.srvhdr[0008:0009]: Date: Tue, 11 Sep 2018 07:51:22 GMT
> 0004:api.srvhdr[0008:0009]: Content-Length: 27
> 0007:frontend1.clicls[0008:0009]
> 0007:frontend1.closed[0008:0009]
> 0006:api-B.clicls[000a:000b]
> 0006:api-B.closed[000a:000b]
>
>
> however with 1.8.13 the client doesn't  gets the response and ends up with
> a 504, however the haproxy debug log shows the response is reviced by
> haproxy, but never passed to the client
>
> curl http://10.2.74.41:8443 -vvv
> * Rebuilt URL to: http://10.2.74.41:8443/
> *   Trying 10.2.74.41...
> * TCP_NODELAY set
> * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> > GET / HTTP/1.1
> > Host: 10.2.74.41:8443
> > User-Agent: curl/7.61.0
> > Accept: */*
> >
> *** HANGS HERE TILL 504 
>
> from haproxy
> :frontend1.accept(0006)=000b from [10.2.74.41:57043] ALPN=
> :frontend1.clireq[000b:]: GET / HTTP/1.1
> :frontend1.clihdr[000b:]: Host: 10.2.74.41:8443
> :frontend1.clihdr[000b:]: User-Agent: curl/7.61.0
> :frontend1.clihdr[000b:]: Accept: */*
> 0001:api-A.accept(0007)=000d from [unix:1] ALPN=
> 0001:api-A.clireq[000d:]: GET / HTTP/1.1
> 0001:api-A.clihdr[000d:]: Host: 10.2.74.41:8443
> 0001:api-A.clihdr[000d:]: User-Agent: curl/7.61.0
> 0001:api-A.clihdr[000d:]: Accept: */*
> 0001:api-A.srvrep[000d:000e]: HTTP/1.1 404 Not Found
> 0001:api-A.srvhdr[000d:000e]: Content-Type: application/problem+json;
> charset=utf-8
> 0001:api-A.srvhdr[000d:000e]: Date: Tue, 11 Sep 2018 07:48:39 GMT
> 0001:api-A.srvhdr[000d:000e]: Content-Length: 27
> ** 5 seconds
> 

Hang in haproxy 1.8.13

2018-09-11 Thread David King
Hi all

i was hoping for some help with see if this is a bug or a mis config

i have a config which works fine in 1.7.9 and 1.7.11, but when running it
1.8.13 is seems to fail to respond to the client

I've simplified down and anonymised the config, but basically it's being
used to do A/B failiover at haproxy so it uses nested frontend and backends
with unix sockets to achieve this

The config is

--
global
pidfile /var/run/haproxy.pid
stats socket /var/run/haproxy.sock mode 600 level admin
stats timeout 2m
maxconn 4000
unix-bind user root mode 666
daemon

defaults
timeout connect 15s
timeout server  1m
timeout http-request15s
timeout check   15s

frontend frontend1
bind 127.0.0.1:8080
mode http
timeout client  1m
default_backend api

frontend api-A
bind /var/run/haproxy/api-A.sock
mode http
default_backend api-A

frontend api-B
bind /var/run/haproxy/api-B.sock
mode http
default_backend api-B

backend api
mode http
balance roundrobin
server api-A /var/run/haproxy/api-A.sock disabled
server api-B /var/run/haproxy/api-B.sock

backend api-A
mode http
balance roundrobin
option forwardfor
server server-1-A 
server server-2-B 

backend api-B
mode http
balance roundrobin
option forwardfor
server server-1-B 
server server-2-B 
-

When running with 1.7.11, its all nice and quick

curl http://10.2.74.41:8443 -vvv
* Rebuilt URL to: http://10.2.74.41:8443/
*   Trying 10.2.74.41...
* TCP_NODELAY set
* Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> GET / HTTP/1.1
> Host: 10.2.74.41:8443
> User-Agent: curl/7.61.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/problem+json; charset=utf-8
< Date: Tue, 11 Sep 2018 07:50:54 GMT
< Content-Length: 27
<
* Connection #0 to host 10.2.74.41 left intact

0004:frontend1.accept(0005)=0008 from [10.2.74.41:44161]
0004:frontend1.clireq[0008:]: GET / HTTP/1.1
0004:frontend1.clihdr[0008:]: Host: 10.2.74.41:8443
0004:frontend1.clihdr[0008:]: User-Agent: curl/7.61.0
0004:frontend1.clihdr[0008:]: Accept: */*
0005:api-B.accept(0007)=000a from [unix:1]
0005:api-B.clireq[000a:]: GET / HTTP/1.1
0005:api-B.clihdr[000a:]: Host: 10.2.74.41:8443
0005:api-B.clihdr[000a:]: User-Agent: curl/7.61.0
0005:api-B.clihdr[000a:]: Accept: */*
0005:api-B.srvrep[000a:000b]: HTTP/1.1 404 Not Found
0005:api-B.srvhdr[000a:000b]: Content-Type: application/problem+json;
charset=utf-8
0005:api-B.srvhdr[000a:000b]: Date: Tue, 11 Sep 2018 07:51:22 GMT
0005:api-B.srvhdr[000a:000b]: Content-Length: 27
0004:api.srvrep[0008:0009]: HTTP/1.1 404 Not Found
0004:api.srvhdr[0008:0009]: Content-Type: application/problem+json;
charset=utf-8
0004:api.srvhdr[0008:0009]: Date: Tue, 11 Sep 2018 07:51:22 GMT
0004:api.srvhdr[0008:0009]: Content-Length: 27
0007:frontend1.clicls[0008:0009]
0007:frontend1.closed[0008:0009]
0006:api-B.clicls[000a:000b]
0006:api-B.closed[000a:000b]


however with 1.8.13 the client doesn't  gets the response and ends up with
a 504, however the haproxy debug log shows the response is reviced by
haproxy, but never passed to the client

curl http://10.2.74.41:8443 -vvv
* Rebuilt URL to: http://10.2.74.41:8443/
*   Trying 10.2.74.41...
* TCP_NODELAY set
* Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> GET / HTTP/1.1
> Host: 10.2.74.41:8443
> User-Agent: curl/7.61.0
> Accept: */*
>
*** HANGS HERE TILL 504 

from haproxy
:frontend1.accept(0006)=000b from [10.2.74.41:57043] ALPN=
:frontend1.clireq[000b:]: GET / HTTP/1.1
:frontend1.clihdr[000b:]: Host: 10.2.74.41:8443
:frontend1.clihdr[000b:]: User-Agent: curl/7.61.0
:frontend1.clihdr[000b:]: Accept: */*
0001:api-A.accept(0007)=000d from [unix:1] ALPN=
0001:api-A.clireq[000d:]: GET / HTTP/1.1
0001:api-A.clihdr[000d:]: Host: 10.2.74.41:8443
0001:api-A.clihdr[000d:]: User-Agent: curl/7.61.0
0001:api-A.clihdr[000d:]: Accept: */*
0001:api-A.srvrep[000d:000e]: HTTP/1.1 404 Not Found
0001:api-A.srvhdr[000d:000e]: Content-Type: application/problem+json;
charset=utf-8
0001:api-A.srvhdr[000d:000e]: Date: Tue, 11 Sep 2018 07:48:39 GMT
0001:api-A.srvhdr[000d:000e]: Content-Length: 27
** 5 seconds
0002:api-A.clicls[adfd:]
0002:api-A.closed[adfd:]
** 15 seconds
:api.srvcls[000b:adfd]
:api.clicls[adfd:adfd]
:api.closed[adfd:adfd]



any idea why this happens? sorry for long post!

Dave