Re: [PATCH 1/1] BUG/MEDIUM: connections: force connections cleanup on server changes

2020-05-02 Thread Olivier Houchard
On Sat, May 02, 2020 at 10:23:05PM +0200, William Dauchy wrote:
> On Sat, May 02, 2020 at 10:17:01PM +0200, Olivier Houchard wrote:
> > If that's the only change you have for a v2, don't bother, I already
> > integrated it in what I plan to push :)
> 
> ok, thanks a lot!
> 

It's now pushed, so any fix for the fix is now welcome :)

Thanks a lot !

Olivier



Re: [PATCH 1/1] BUG/MEDIUM: connections: force connections cleanup on server changes

2020-05-02 Thread William Dauchy
On Sat, May 02, 2020 at 10:17:01PM +0200, Olivier Houchard wrote:
> If that's the only change you have for a v2, don't bother, I already
> integrated it in what I plan to push :)

ok, thanks a lot!

-- 
William



Re: [PATCH 0/1] fix idle connections cleanup

2020-05-02 Thread William Dauchy
Hello Olivier,

On Sat, May 02, 2020 at 10:11:26PM +0200, Olivier Houchard wrote:
> And I think your fix is correct, so I will apply it as-is. However, I
> think it should be applied to previous branches, that also had idle
> connection pools. It is probably less noticable before 079cb9af22da6,
> because before that patch, one thread couldn't reuse a connection owned
> by another thread, so to reproduce it you'd have to test it until you
> hit the same thread twice, but I don't see why it wouldn't be an issue
> tgere

Thanks for the review, I hope it is not too late to send a v2 with a
small fix I realised after sending my first version.
Will also change the commit message regarding the backport.


Thanks,
-- 
William



Re: [PATCH 1/1] BUG/MEDIUM: connections: force connections cleanup on server changes

2020-05-02 Thread Olivier Houchard
On Sat, May 02, 2020 at 10:14:15PM +0200, William Dauchy wrote:
> On Sat, May 02, 2020 at 09:52:36PM +0200, William Dauchy wrote:
> > +void srv_cleanup_connections(struct server *srv)
> > +{
> > +   struct connection *conn;
> > +   int did_remove;
> > +   int i;
> > +   int j;
> > +
> > +   HA_SPIN_LOCK(OTHER_LOCK, &idle_conn_srv_lock);
> > +   for (i = 0; i < global.nbthread; i++) {
> > +   did_remove = 0;
> > +   HA_SPIN_LOCK(OTHER_LOCK, &toremove_lock[i]);
> > +   for (j = 0; j < srv->curr_idle_conns; j++) {
> > +   conn = MT_LIST_POP(&srv->idle_conns[i], struct 
> > connection *, list);
> > +   if (!conn)
> > +   conn = MT_LIST_POP(&srv->safe_conns[i],
> > +  struct connection *, list);
> > +   if (!conn)
> > +   break;
> 
> just realised I forgot:
>   did_remove = 1;
> 
> but will wait before sending a possible v2 with other feedbacks.

If that's the only change you have for a v2, don't bother, I already
integrated it in what I plan to push :)

Regards,

Olivier



Re: [PATCH 1/1] BUG/MEDIUM: connections: force connections cleanup on server changes

2020-05-02 Thread William Dauchy
On Sat, May 02, 2020 at 09:52:36PM +0200, William Dauchy wrote:
> +void srv_cleanup_connections(struct server *srv)
> +{
> + struct connection *conn;
> + int did_remove;
> + int i;
> + int j;
> +
> + HA_SPIN_LOCK(OTHER_LOCK, &idle_conn_srv_lock);
> + for (i = 0; i < global.nbthread; i++) {
> + did_remove = 0;
> + HA_SPIN_LOCK(OTHER_LOCK, &toremove_lock[i]);
> + for (j = 0; j < srv->curr_idle_conns; j++) {
> + conn = MT_LIST_POP(&srv->idle_conns[i], struct 
> connection *, list);
> + if (!conn)
> + conn = MT_LIST_POP(&srv->safe_conns[i],
> +struct connection *, list);
> + if (!conn)
> + break;

just realised I forgot:
did_remove = 1;

but will wait before sending a possible v2 with other feedbacks.

> + MT_LIST_ADDQ(&toremove_connections[i], (struct mt_list 
> *)&conn->list);
> + }
> + HA_SPIN_UNLOCK(OTHER_LOCK, &toremove_lock[i]);
> + if (did_remove)
> + task_wakeup(idle_conn_cleanup[i], TASK_WOKEN_OTHER);
> + }
> + HA_SPIN_UNLOCK(OTHER_LOCK, &idle_conn_srv_lock);
> +}
> +
>  struct task *srv_cleanup_idle_connections(struct task *task, void *context, 
> unsigned short state)
>  {
>   struct server *srv;
> -- 
> 2.26.2
> 



Re: [PATCH 0/1] fix idle connections cleanup

2020-05-02 Thread Olivier Houchard
Hi William,

On Sat, May 02, 2020 at 09:52:35PM +0200, William Dauchy wrote:
> Hello Olivier,
> 
> The following patch is an attempt to fix a change of behavior we encounter
> following commit:
> 
> 079cb9af22da6 ("MEDIUM: connections: Revamp the way idle connections are 
> killed")
> being the origin of this new behaviour.
> 
> All details are in the commit message itself.
> The approach is most likely not the good one but feel free to advise so
> I may improve it.
> 
This is a good catch !

And I think your fix is correct, so I will apply it as-is. However, I
think it should be applied to previous branches, that also had idle
connection pools. It is probably less noticable before 079cb9af22da6,
because before that patch, one thread couldn't reuse a connection owned
by another thread, so to reproduce it you'd have to test it until you
hit the same thread twice, but I don't see why it wouldn't be an issue
tgere

Thanks a lot !

Olivier



[PATCH 0/1] fix idle connections cleanup

2020-05-02 Thread William Dauchy
Hello Olivier,

The following patch is an attempt to fix a change of behavior we encounter
following commit:

079cb9af22da6 ("MEDIUM: connections: Revamp the way idle connections are 
killed")
being the origin of this new behaviour.

All details are in the commit message itself.
The approach is most likely not the good one but feel free to advise so
I may improve it.

William Dauchy (1):
  BUG/MEDIUM: connections: force connections cleanup on server changes

 include/proto/server.h |  1 +
 src/server.c   | 37 -
 2 files changed, 37 insertions(+), 1 deletion(-)

Thanks,
-- 
William




[PATCH 1/1] BUG/MEDIUM: connections: force connections cleanup on server changes

2020-05-02 Thread William Dauchy
I've been trying to understand a change of behaviour between v2.2dev5 and
v2.2dev6. Indeed our probe is regularly testing to add and remove
servers on a given backend such as:

 # echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
 113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
 113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -

 -> curl on the corresponding frontend: reply from server:31255

 # echo "set server be_foo/srv1 addr 10.236.139.34 port 31257" | sudo socat 
stdio /var/lib/haproxy/stats
 IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31257' 
by 'stats socket command'
 # echo "set server be_foo/srv1 weight 256" | sudo socat stdio 
/var/lib/haproxy/stats
 # echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio 
/var/lib/haproxy/stats
 health check port updated.
 # echo "set server be_foo/srv1 state ready" | sudo socat stdio 
/var/lib/haproxy/stats
 # echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
 113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
 113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31257 -

 -> curl on the corresponding frontend: reply for server:31257
 (notice the difference of weight)

 # echo "set server be_foo/srv1 state maint" | sudo socat stdio 
/var/lib/haproxy/stats
 # echo "set server be_foo/srv1 addr 0.0.0.0 port 0" | sudo socat stdio 
/var/lib/haproxy/stats
 IP changed from '10.236.139.34' to '0.0.0.0', port changed from '31257' to '0' 
by 'stats socket command'
 # echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
 113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
 113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -

 -> curl on the corresponding frontend: reply from server:31255

 # echo "set server be_foo/srv1 addr 10.236.139.34 port 31256" | sudo socat 
stdio /var/lib/haproxy/stats
 IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31256' 
by 'stats socket command'
 # echo "set server be_foo/srv1 weight 256" | sudo socat stdio 
/var/lib/haproxy/stats
 # echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio 
/var/lib/haproxy/stats
 health check port updated.
 # echo "set server be_foo/srv1 state ready" | sudo socat stdio 
/var/lib/haproxy/stats
 # echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
 113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
 113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31256 -

 -> curl on the corresponding frontend: reply from server:31257 (!)

Here we indeed would expect to get an anver from server:31256. The issue
is highly linked to the usage of `pool-purge-delay`, with a value which
is higher than the duration of the test, 10s in our case.

a git bisect between dev5 and dev6 seems to show commit
079cb9af22da6 ("MEDIUM: connections: Revamp the way idle connections are 
killed")
being the origin of this new behaviour.

So if I understand the later correctly, it seems that it was more a
matter of chance that we did not saw the issue earlier.

My patch proposes to force clean idle connections in the two following
cases:
- we set a (still running) server to maintenance
- we change the ip/port of a server

A backport is not needed as the commit revealing the issue is present
only on v2.2 branch only.

Fixes: 079cb9af22da6 ("MEDIUM: connections: Revamp the way idle
connections are killed")
Signed-off-by: William Dauchy 
---
 include/proto/server.h |  1 +
 src/server.c   | 37 -
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/proto/server.h b/include/proto/server.h
index f8fed3148..388c7e2ff 100644
--- a/include/proto/server.h
+++ b/include/proto/server.h
@@ -65,6 +65,7 @@ const char *update_server_fqdn(struct server *server, const 
char *fqdn, const ch
 int snr_resolution_cb(struct dns_requester *requester, struct dns_nameserver 
*nameserver);
 int snr_resolution_error_cb(struct dns_requester *requester, int error_code);
 struct server *snr_check_ip_callback(struct server *srv, void *ip, unsigned 
char *ip_family);
+void srv_cleanup_connections(struct server *srv);
 struct task *srv_cleanup_idle_connections(struct task *task, void *ctx, 
unsigned short state);
 struct task *srv_cleanup_toremove_connections(struct task *task, void 
*context, unsigned short state);
 
diff --git a/src/server.c b/src/server.c
index 4d5e706d3..751439d76 100644
--- a/src/server.c
+++ b/src/server.c
@@ -3669,8 +3669,11 @@ const char *update_server_addr_port(struct server *s, 
const char *addr, const ch
}
 
 out:
-   if (changed)
+   if (changed) {
+   /* force connection cleanup on the given server */
+   srv_cleanup_connections(s);
srv_set_dyncookie(s);
+   }
if (updater)
chunk_appendf(msg, " by '%s'", updater);
c

Re: 'http-check connect default linger proto fcgi' keeps connections open?

2020-05-02 Thread Aleksandar Lazic
On 02.05.20 14:25, Aleksandar Lazic wrote:
> Hi.
> 
> May 2, 2020 9:43:40 AM Christopher Faulet :
> 
>> Le 02/05/2020 à 00:05, Aleksandar Lazic a écrit :
>>
>>> Hi.
>>> I wanted to use the shiny new http-check feature and have seen that the 
>>> connection keeps alive after the health check.
>>> I have also tried to remove "linger" but this does not change anything.
>>> Maybe I make something wrong.
>>>
>>>
>> Hi Aleks,
>>
>> You're right. There is a bug. And trying to fix it, I found 2 others :) It 
>> was a wrong test on the FCGI connection flags. Because of this bug, the 
>> connection remains opened till the server timeout. I pushed fixes in 
>> upstream. It should be ok now.
> 
> Ah cool.
> 
> I will try the next snapshot.

Okay I was to impatient to see if it works and have now build the Docker image 
from git clone ;-)
I can confirm that this bug was fixed.

Now we have a active testing loadbalancer for php-fpm and other fcgi backends 
8-O

>> Thanks,

Thanks Christopher.

Best regards
Aleks



Re: 'http-check connect default linger proto fcgi' keeps connections open?

2020-05-02 Thread Aleksandar Lazic
Hi.

May 2, 2020 9:43:40 AM Christopher Faulet :

> Le 02/05/2020 à 00:05, Aleksandar Lazic a écrit :
>
> > Hi.
> > I wanted to use the shiny new http-check feature and have seen that the 
> > connection keeps alive after the health check.
> > I have also tried to remove "linger" but this does not change anything.
> > Maybe I make something wrong.
> >
> >
> Hi Aleks,
>
> You're right. There is a bug. And trying to fix it, I found 2 others :) It 
> was a wrong test on the FCGI connection flags. Because of this bug, the 
> connection remains opened till the server timeout. I pushed fixes in 
> upstream. It should be ok now.

Ah cool.

I will try the next snapshot.

> Thanks,
>
>






The log variable `%HP` includes the host for HTTP/2 but not for HTTP/1.1 (backward incompatible change?)

2020-05-02 Thread Ciprian Dorin Craciun
I've upgraded from HAProxy 1.8 to 2.1 branch, and without changing the
configuration I've observed the following inconsistency:

The `%HP` log variable used to contain only the path for both HTTP/2
and HTTP/1.1 in HAProxy 1.8.

How with HAProxy 2.1.4 (as available on OpenSUSE 15.1) the `%HP` log
variable contains the full URI (without query) for HTTP/2, however for
HTTP/1.1 it contains only the path.

Is this an expected change, or is it a backward incompatible change?

Thanks,
Ciprian.



Re: 'http-check connect default linger proto fcgi' keeps connections open?

2020-05-02 Thread Christopher Faulet

Le 02/05/2020 à 00:05, Aleksandar Lazic a écrit :

Hi.

I wanted to use the shiny new http-check feature and have seen that the 
connection keeps alive after the health check.
I have also tried to remove "linger" but this does not change anything.
Maybe I make something wrong.



Hi Aleks,

You're right. There is a bug. And trying to fix it, I found 2 others :) It was a 
wrong test on the FCGI connection flags. Because of this bug, the connection 
remains opened till the server timeout. I pushed fixes in upstream. It should be 
ok now.


Thanks,
--
Christopher Faulet