Re: minconn, maxconn and fullconn (again, sigh!)

2021-02-10 Thread Victor Sudakov
Lukas Tribus wrote:
> 
> On Wed, 10 Feb 2021 at 16:55, Victor Sudakov  wrote:
> >
> > I can even phrase my question in simpler terms. What happens if the sum
> > total of all servers' maxconns in a backend is less than the maxconn
> > value in the frontend pointing to the said backend?
> 
> Queueing for "timeout queue" amount of time, and then return 503 error

And what happens in TCP mode, after the "timeout queue" amount of time?
I suppose the TCP connection with the client is reset?

> 
> See:
> 
> timeout queue
> https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#4.2-timeout%20queue
> 
> maxconn
> https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#5.2-maxconn
> 
> 
> I really suggest you ignore minconn and fullconn, and focus on maxconn
> instead. The latter is a must-have (and must-understand). Really
> maxconn (global, frontend and per server ) is the single most
> important performance knob in haproxy.

Maxconn is rather clear, especially when one is sure about two things:

1. A server's maxconn value is always a hard limit (notwithstanding if
there is a minconn configured for the server).

2. Connections outnumbering the sum total of a backend's servers
maxconns are queued for the "timeout queue" amount of time and then
dropped.

Are the above statements correct?

It would be nice however to understand minconn/fullconn too. If a
backend has several servers with identical minconn, maxconn and weight,
what's the point of having minconn? The load will be always distributed
evenly between all the servers notwithstanding minconn/fullconn,
correct?

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
2:5005/49@fidonet http://vas.tomsk.ru/



Re: minconn, maxconn and fullconn (again, sigh!)

2021-02-10 Thread Lukas Tribus
Hello Victor,

On Wed, 10 Feb 2021 at 16:55, Victor Sudakov  wrote:
>
> I can even phrase my question in simpler terms. What happens if the sum
> total of all servers' maxconns in a backend is less than the maxconn
> value in the frontend pointing to the said backend?

Queueing for "timeout queue" amount of time, and then return 503 error
(and this really is desirable as opposed to hitting maxconn on a
frontend or even worse, global maxconn, because a few milliseconds of
delay do not hurt and returning a proper HTTP error in distress is way
better then some obscure connection timeout).

See:

timeout queue
https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#4.2-timeout%20queue

maxconn
https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#5.2-maxconn


I really suggest you ignore minconn and fullconn, and focus on maxconn
instead. The latter is a must-have (and must-understand). Really
maxconn (global, frontend and per server ) is the single most
important performance knob in haproxy.


Lukas



Segfault in liblua-5.3.so

2021-02-10 Thread Sachin Shetty
Hi,

We have a lua block that connects to memcache when a request arrives

"""
function get_from_gds(host, port, key)local sock = core.tcp()
sock:settimeout(20)local result = DOMAIN_NOT_FOUNDlocal
status, error = sock:connect(host, port)if not status then
core.Alert(GDS_LOG_PREFIX .. "GDS_ERROR: Error in connecting:" .. key
.. ":" .. port .. ":" .. error)return GDS_ERROR, "Error: " ..
errorendsock:send(key .. "\r\n")while true dolocal
s, status, partial = sock:receive("*l")if not s then
 core.Alert(GDS_LOG_PREFIX .. "GDS_ERROR: Error reading:" .. key ..
":" .. port .. ":" .. status)return GDS_ERROR, status
  endif s == "END" then break endresult = send
sock:close()return resultend

-- Comment: get_proxy calls get_from_gds

core.register_action("get_proxy", { "http-req" }, get_proxy)
"""
The value is cached in a haproxy map so we don't make a memcache
connection for every request.

At peak traffic if we reload haproxy, that invalidates the map and the
surge causes
quite a few memcache connections to fail. Error returned is "Can't connect"

We see the following messages in dmesg

[  +0.006924] haproxy[14258]: segfault at 0 ip 7f117fba94c4 sp
7f1179eefe08 error 4 in liblua-5.3.so[7f117fba1000+37000]

HA-Proxy version 2.0.18-be8b761 2020/09/30 - https://haproxy.org/

This is a recent issue, we never saw this in 1.8.


Any idea? We only see this at peak load. At regular load we don't see
this issue
even when we reload haproxy.

Thanks
Sachin


Re: [PATCH v2 0/6] cli commands for checks and agent

2021-02-10 Thread William Dauchy
On Wed, Feb 10, 2021 at 4:21 PM Christopher Faulet  wrote:
> To conclude, I may merge the first 4 patches if you want, but I guess you
> will have to adapt the first ones to produce better warnings. So I will
> wait your confirmation to proceed. However, I will merge the bug fix.
> There is no reason to not take it.

thanks for the review.
don't worry I will come back with a v3.

-- 
William



Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"

2021-02-10 Thread Christian Ruppert

On 2021-02-10 18:15, Christopher Faulet wrote:

Le 08/02/2021 à 14:31, Christian Ruppert a écrit :

Hi list, Christopher,

we're having issues with the mentioned commit / patch:
d13afbcce5e664f9cfe797eee8c527e5fa947f1b
https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b

I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any
useful details yet, just the our Jira fails to load.
A curl against the site seams to work fine while browser requests
(chrome / firefox) seem to timeout or at least some.

See the attached log. The first 3 requests seem to be fine so far. 
Then,

much later, there's a 504 between more 200s.
I'm not sure yet why the other 200s there seem to wait / are logged
after the actual timeout happens. According to chrome's F12 there are
more requests still pending.
Ignore the 503 there. That seems to be an unrelated problem, since 
this

also happends with a working HAProxy.

Much later, the site loaded, sometimes broken though.

I'll try to prepare a config snipped if required.

Is there anything know already?



Hi,

Thanks to information that Christian provided me offlist, I've finally
found and fixed the bug. The corresponding commit is :

commit a22782b597ee9a3bfecb18a66e29633c8e814216
Author: Christopher Faulet 
Date:   Mon Feb 8 17:18:01 2021 +0100

BUG/MEDIUM: mux-h1: Always set CS_FL_EOI for response in MSG_DONE 
state


During the message parsing, if in MSG_DONE state, the CS_FL_EOI 
flag must

always be set on the conn-stream if following conditions are met :

  * It is a response or
  * It is a request but not a protocol upgrade nor a CONNECT.

For now, there is no test on the message type (request or 
response). Thus
the CS_FL_EOI flag is not set for a response with a "Connection: 
upgrade"

header but not a 101 response.

This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: 
mux-h1: Don't
set CS_FL_EOI too early for protocol upgrade requests"). It was 
backported
as far as 2.0. Thus, this patch must also be backported as far as 
2.0.


However, it is not backported yet. Thanks Christian !


Thanks for the very fast patching, Christopher! I've rolled out the new 
version on some more production machines and I haven't noticed or heard 
of any issues yet. Tomorrow I'll roll it out to the rest of our LBs.


--
Regards,
Christian Ruppert



Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"

2021-02-10 Thread Christopher Faulet

Le 08/02/2021 à 14:31, Christian Ruppert a écrit :

Hi list, Christopher,

we're having issues with the mentioned commit / patch:
d13afbcce5e664f9cfe797eee8c527e5fa947f1b
https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b

I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any
useful details yet, just the our Jira fails to load.
A curl against the site seams to work fine while browser requests
(chrome / firefox) seem to timeout or at least some.

See the attached log. The first 3 requests seem to be fine so far. Then,
much later, there's a 504 between more 200s.
I'm not sure yet why the other 200s there seem to wait / are logged
after the actual timeout happens. According to chrome's F12 there are
more requests still pending.
Ignore the 503 there. That seems to be an unrelated problem, since this
also happends with a working HAProxy.

Much later, the site loaded, sometimes broken though.

I'll try to prepare a config snipped if required.

Is there anything know already?



Hi,

Thanks to information that Christian provided me offlist, I've finally found and 
fixed the bug. The corresponding commit is :


commit a22782b597ee9a3bfecb18a66e29633c8e814216
Author: Christopher Faulet 
Date:   Mon Feb 8 17:18:01 2021 +0100

BUG/MEDIUM: mux-h1: Always set CS_FL_EOI for response in MSG_DONE state

During the message parsing, if in MSG_DONE state, the CS_FL_EOI flag must
always be set on the conn-stream if following conditions are met :

  * It is a response or
  * It is a request but not a protocol upgrade nor a CONNECT.

For now, there is no test on the message type (request or response). Thus
the CS_FL_EOI flag is not set for a response with a "Connection: upgrade"
header but not a 101 response.

This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: mux-h1: Don't
set CS_FL_EOI too early for protocol upgrade requests"). It was backported
as far as 2.0. Thus, this patch must also be backported as far as 2.0.

However, it is not backported yet. Thanks Christian !
--
Christopher Faulet



Re: minconn, maxconn and fullconn (again, sigh!)

2021-02-10 Thread Victor Sudakov
I can even phrase my question in simpler terms. What happens if the sum
total of all servers' maxconns in a backend is less than the maxconn
value in the frontend pointing to the said backend?

Victor Sudakov wrote:
> Dear Colleagues,
> 
> The dynamic limit is probably one of the darker sides of Haproxy
> configuration. One of the best explanations I've found is
> https://www.mail-archive.com/haproxy@formilux.org/msg04782.html
> but still I'm missing some points.
> 
> Consider the following configuration:
> 
> 
> frontend bar
> maxconn 1
> default_backend foo
> 
> backend foo
> server server1 server1: minconn 10 maxconn 100 weight 100 
> server server2 server2: minconn 10 maxconn 100 weight 100 
> 
> 
> This configuration will set the automatic implicit fullconn=1000 to the
> backend "foo", correct?
> 
> "maxconn 100" is still a hard limit, correct?
> 
> So, when there happen to be 500 connections to the backend "foo", 200
> of them will be served by server1 and server2, what will happen to the
> other 300 connections?
> 
> The same can be asked about the example from the Haproxy documentation
> with the explicit fullconn:
> 
> 
> backend dynamic
> fullconn   1
> server srv1   dyn1:80 minconn 100 maxconn 1000
> server srv2   dyn2:80 minconn 100 maxconn 1000
> 
> 
> "maxconn 1000" is still a hard limit, correct?
> 
> When 4000 connections come to this backend, srv1 and srv2 will serve 2000
> of them (each reaching its hard limit at "maxconn=1000"), what will happen
> to the other 2000 ?
> 
> -- 
> Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
> 2:5005/49@fidonet http://vas.tomsk.ru/
> 

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
2:5005/49@fidonet http://vas.tomsk.ru/



Re: [PATCH v2 0/6] cli commands for checks and agent

2021-02-10 Thread Christopher Faulet

Le 08/02/2021 à 23:53, William Dauchy a écrit :

Hello Christopher,

Here is the v2 addressing the points raised yesterday.
The patch 4/6 clearly looks scary but I made sure to not change anything
crazy apart from adding support for a version 2. I will probably start
to dream about a server-state-file burning every night.
I hope this will be good enough, but otherwise, don't hesitate to put
more comments, I will come back with a v3.

William Dauchy (6):
   MEDIUM: cli: add check-addr command
   MEDIUM: cli: add agent-port command
   BUG/MEDIUM: server: re-align state file fields number
   MEDIUM: server: add server-states version 2
   MEDIUM: server: support {check,agent}_addr, agent_port in server state
   CLEANUP: server: add missing space in server-state error output

  doc/management.txt|  15 +-
  include/haproxy/server-t.h|  16 +-
  .../checks/1be_40srv_odd_health_checks.vtc|   2 +-
  .../checks/40be_2srv_odd_health_checks.vtc|   2 +-
  reg-tests/checks/4be_1srv_health_checks.vtc   |   6 +-
  src/proxy.c   |  41 +-
  src/server.c  | 929 ++
  7 files changed, 573 insertions(+), 438 deletions(-)



William,

Thanks for this new series. There are problems in the 5th patch. When a
server-state file is loaded, if health-check or agent-check are disabled for
a server, a warning is always emitted, independently on the parameters.
Especially if the parameters are both unset (NULL or "-" for addresses and
NULL and "0" for port). The format of this warning is also not really
convenient. While the warning emitted on the CLI is good, when the
server-state file is loaded, we have such warning :

[WARNING] 040/150958 (3623566) : server-state application failed for server 
'back-http/www' health checks are not enabled on this server.
agent checks are not enabled on this server.

I suppose it also explains your last patch ("CLEANUP: server: add missing
space in server-state error output").

In addition, the warning is a bit misleading because most of the server
state is loaded at this stage. Note this comment is also true for the
warnings about the SRV resolution. I guess we should instead emit a
different warning depending the server state is not loaded at all or
partially loaded with the list of skipped part. But at least if the warning
is well formatted, it should be good for this patch. Having 2 types of
warnings may be handled in another patch.

Finally, there is a more annoying bug, but pretty easy to fix. You must be
careful to not use chunk_appendf() with a variable format string (2nd arg).
Just like with the printf family functions, you must not call it with a
non-constant format string. Here, there is a problem if I manually update
the server-state file to set "%s%s%s%s%s%s" as health-check or agent-check
address.

Thus replace this line :

chunk_appendf(msg, warning);

by

chunk_appendf(msg, "%s", warning);

or better :

chunk_strcat(msg, warning);

To conclude, I may merge the first 4 patches if you want, but I guess you
will have to adapt the first ones to produce better warnings. So I will
wait your confirmation to proceed. However, I will merge the bug fix.
There is no reason to not take it.

--
Christopher Faulet



[PATCH] cleanup: remove unused variable assignment

2021-02-10 Thread Илья Шипицин
Hello,

this is pure cleanup.
should fix #1048

Ilya
From 350c37117369607928ddb232cf8f86b1c218bd47 Mon Sep 17 00:00:00 2001
From: Ilya Shipitsin 
Date: Wed, 10 Feb 2021 13:03:30 +0500
Subject: [PATCH] CLEANUP: remove unused variable assigned found by Coverity

this is pure cleanup, no need to backport

2116if ((end - 1) == (payload + strlen(PAYLOAD_PATTERN))) {
2117/* if the payload pattern is at the end */
2118s->pcli_flags |= PCLI_F_PAYLOAD;
CID 1399833 (#1 of 1): Unused value (UNUSED_VALUE)assigned_value: Assigning value from reql to ret here, but that stored value is overwritten before it can be used.
2119ret = reql;
2120}
---
 src/cli.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/cli.c b/src/cli.c
index bfce9a60b..8b32e7610 100644
--- a/src/cli.c
+++ b/src/cli.c
@@ -2184,7 +2184,6 @@ int pcli_parse_request(struct stream *s, struct channel *req, char **errmsg, int
 	if ((end - 1) == (payload + strlen(PAYLOAD_PATTERN))) {
 		/* if the payload pattern is at the end */
 		s->pcli_flags |= PCLI_F_PAYLOAD;
-		ret = reql;
 	}
 
 	*(end-1) = '\n';
-- 
2.29.2