Re: minconn, maxconn and fullconn (again, sigh!)
Lukas Tribus wrote: > > On Wed, 10 Feb 2021 at 16:55, Victor Sudakov wrote: > > > > I can even phrase my question in simpler terms. What happens if the sum > > total of all servers' maxconns in a backend is less than the maxconn > > value in the frontend pointing to the said backend? > > Queueing for "timeout queue" amount of time, and then return 503 error And what happens in TCP mode, after the "timeout queue" amount of time? I suppose the TCP connection with the client is reset? > > See: > > timeout queue > https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#4.2-timeout%20queue > > maxconn > https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#5.2-maxconn > > > I really suggest you ignore minconn and fullconn, and focus on maxconn > instead. The latter is a must-have (and must-understand). Really > maxconn (global, frontend and per server ) is the single most > important performance knob in haproxy. Maxconn is rather clear, especially when one is sure about two things: 1. A server's maxconn value is always a hard limit (notwithstanding if there is a minconn configured for the server). 2. Connections outnumbering the sum total of a backend's servers maxconns are queued for the "timeout queue" amount of time and then dropped. Are the above statements correct? It would be nice however to understand minconn/fullconn too. If a backend has several servers with identical minconn, maxconn and weight, what's the point of having minconn? The load will be always distributed evenly between all the servers notwithstanding minconn/fullconn, correct? -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN 2:5005/49@fidonet http://vas.tomsk.ru/
Re: minconn, maxconn and fullconn (again, sigh!)
Hello Victor, On Wed, 10 Feb 2021 at 16:55, Victor Sudakov wrote: > > I can even phrase my question in simpler terms. What happens if the sum > total of all servers' maxconns in a backend is less than the maxconn > value in the frontend pointing to the said backend? Queueing for "timeout queue" amount of time, and then return 503 error (and this really is desirable as opposed to hitting maxconn on a frontend or even worse, global maxconn, because a few milliseconds of delay do not hurt and returning a proper HTTP error in distress is way better then some obscure connection timeout). See: timeout queue https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#4.2-timeout%20queue maxconn https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#5.2-maxconn I really suggest you ignore minconn and fullconn, and focus on maxconn instead. The latter is a must-have (and must-understand). Really maxconn (global, frontend and per server ) is the single most important performance knob in haproxy. Lukas
Segfault in liblua-5.3.so
Hi, We have a lua block that connects to memcache when a request arrives """ function get_from_gds(host, port, key)local sock = core.tcp() sock:settimeout(20)local result = DOMAIN_NOT_FOUNDlocal status, error = sock:connect(host, port)if not status then core.Alert(GDS_LOG_PREFIX .. "GDS_ERROR: Error in connecting:" .. key .. ":" .. port .. ":" .. error)return GDS_ERROR, "Error: " .. errorendsock:send(key .. "\r\n")while true dolocal s, status, partial = sock:receive("*l")if not s then core.Alert(GDS_LOG_PREFIX .. "GDS_ERROR: Error reading:" .. key .. ":" .. port .. ":" .. status)return GDS_ERROR, status endif s == "END" then break endresult = send sock:close()return resultend -- Comment: get_proxy calls get_from_gds core.register_action("get_proxy", { "http-req" }, get_proxy) """ The value is cached in a haproxy map so we don't make a memcache connection for every request. At peak traffic if we reload haproxy, that invalidates the map and the surge causes quite a few memcache connections to fail. Error returned is "Can't connect" We see the following messages in dmesg [ +0.006924] haproxy[14258]: segfault at 0 ip 7f117fba94c4 sp 7f1179eefe08 error 4 in liblua-5.3.so[7f117fba1000+37000] HA-Proxy version 2.0.18-be8b761 2020/09/30 - https://haproxy.org/ This is a recent issue, we never saw this in 1.8. Any idea? We only see this at peak load. At regular load we don't see this issue even when we reload haproxy. Thanks Sachin
Re: [PATCH v2 0/6] cli commands for checks and agent
On Wed, Feb 10, 2021 at 4:21 PM Christopher Faulet wrote: > To conclude, I may merge the first 4 patches if you want, but I guess you > will have to adapt the first ones to produce better warnings. So I will > wait your confirmation to proceed. However, I will merge the bug fix. > There is no reason to not take it. thanks for the review. don't worry I will come back with a v3. -- William
Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"
On 2021-02-10 18:15, Christopher Faulet wrote: Le 08/02/2021 à 14:31, Christian Ruppert a écrit : Hi list, Christopher, we're having issues with the mentioned commit / patch: d13afbcce5e664f9cfe797eee8c527e5fa947f1b https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any useful details yet, just the our Jira fails to load. A curl against the site seams to work fine while browser requests (chrome / firefox) seem to timeout or at least some. See the attached log. The first 3 requests seem to be fine so far. Then, much later, there's a 504 between more 200s. I'm not sure yet why the other 200s there seem to wait / are logged after the actual timeout happens. According to chrome's F12 there are more requests still pending. Ignore the 503 there. That seems to be an unrelated problem, since this also happends with a working HAProxy. Much later, the site loaded, sometimes broken though. I'll try to prepare a config snipped if required. Is there anything know already? Hi, Thanks to information that Christian provided me offlist, I've finally found and fixed the bug. The corresponding commit is : commit a22782b597ee9a3bfecb18a66e29633c8e814216 Author: Christopher Faulet Date: Mon Feb 8 17:18:01 2021 +0100 BUG/MEDIUM: mux-h1: Always set CS_FL_EOI for response in MSG_DONE state During the message parsing, if in MSG_DONE state, the CS_FL_EOI flag must always be set on the conn-stream if following conditions are met : * It is a response or * It is a request but not a protocol upgrade nor a CONNECT. For now, there is no test on the message type (request or response). Thus the CS_FL_EOI flag is not set for a response with a "Connection: upgrade" header but not a 101 response. This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"). It was backported as far as 2.0. Thus, this patch must also be backported as far as 2.0. However, it is not backported yet. Thanks Christian ! Thanks for the very fast patching, Christopher! I've rolled out the new version on some more production machines and I haven't noticed or heard of any issues yet. Tomorrow I'll roll it out to the rest of our LBs. -- Regards, Christian Ruppert
Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"
Le 08/02/2021 à 14:31, Christian Ruppert a écrit : Hi list, Christopher, we're having issues with the mentioned commit / patch: d13afbcce5e664f9cfe797eee8c527e5fa947f1b https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any useful details yet, just the our Jira fails to load. A curl against the site seams to work fine while browser requests (chrome / firefox) seem to timeout or at least some. See the attached log. The first 3 requests seem to be fine so far. Then, much later, there's a 504 between more 200s. I'm not sure yet why the other 200s there seem to wait / are logged after the actual timeout happens. According to chrome's F12 there are more requests still pending. Ignore the 503 there. That seems to be an unrelated problem, since this also happends with a working HAProxy. Much later, the site loaded, sometimes broken though. I'll try to prepare a config snipped if required. Is there anything know already? Hi, Thanks to information that Christian provided me offlist, I've finally found and fixed the bug. The corresponding commit is : commit a22782b597ee9a3bfecb18a66e29633c8e814216 Author: Christopher Faulet Date: Mon Feb 8 17:18:01 2021 +0100 BUG/MEDIUM: mux-h1: Always set CS_FL_EOI for response in MSG_DONE state During the message parsing, if in MSG_DONE state, the CS_FL_EOI flag must always be set on the conn-stream if following conditions are met : * It is a response or * It is a request but not a protocol upgrade nor a CONNECT. For now, there is no test on the message type (request or response). Thus the CS_FL_EOI flag is not set for a response with a "Connection: upgrade" header but not a 101 response. This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"). It was backported as far as 2.0. Thus, this patch must also be backported as far as 2.0. However, it is not backported yet. Thanks Christian ! -- Christopher Faulet
Re: minconn, maxconn and fullconn (again, sigh!)
I can even phrase my question in simpler terms. What happens if the sum total of all servers' maxconns in a backend is less than the maxconn value in the frontend pointing to the said backend? Victor Sudakov wrote: > Dear Colleagues, > > The dynamic limit is probably one of the darker sides of Haproxy > configuration. One of the best explanations I've found is > https://www.mail-archive.com/haproxy@formilux.org/msg04782.html > but still I'm missing some points. > > Consider the following configuration: > > > frontend bar > maxconn 1 > default_backend foo > > backend foo > server server1 server1: minconn 10 maxconn 100 weight 100 > server server2 server2: minconn 10 maxconn 100 weight 100 > > > This configuration will set the automatic implicit fullconn=1000 to the > backend "foo", correct? > > "maxconn 100" is still a hard limit, correct? > > So, when there happen to be 500 connections to the backend "foo", 200 > of them will be served by server1 and server2, what will happen to the > other 300 connections? > > The same can be asked about the example from the Haproxy documentation > with the explicit fullconn: > > > backend dynamic > fullconn 1 > server srv1 dyn1:80 minconn 100 maxconn 1000 > server srv2 dyn2:80 minconn 100 maxconn 1000 > > > "maxconn 1000" is still a hard limit, correct? > > When 4000 connections come to this backend, srv1 and srv2 will serve 2000 > of them (each reaching its hard limit at "maxconn=1000"), what will happen > to the other 2000 ? > > -- > Victor Sudakov, VAS4-RIPE, VAS47-RIPN > 2:5005/49@fidonet http://vas.tomsk.ru/ > -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN 2:5005/49@fidonet http://vas.tomsk.ru/
Re: [PATCH v2 0/6] cli commands for checks and agent
Le 08/02/2021 à 23:53, William Dauchy a écrit : Hello Christopher, Here is the v2 addressing the points raised yesterday. The patch 4/6 clearly looks scary but I made sure to not change anything crazy apart from adding support for a version 2. I will probably start to dream about a server-state-file burning every night. I hope this will be good enough, but otherwise, don't hesitate to put more comments, I will come back with a v3. William Dauchy (6): MEDIUM: cli: add check-addr command MEDIUM: cli: add agent-port command BUG/MEDIUM: server: re-align state file fields number MEDIUM: server: add server-states version 2 MEDIUM: server: support {check,agent}_addr, agent_port in server state CLEANUP: server: add missing space in server-state error output doc/management.txt| 15 +- include/haproxy/server-t.h| 16 +- .../checks/1be_40srv_odd_health_checks.vtc| 2 +- .../checks/40be_2srv_odd_health_checks.vtc| 2 +- reg-tests/checks/4be_1srv_health_checks.vtc | 6 +- src/proxy.c | 41 +- src/server.c | 929 ++ 7 files changed, 573 insertions(+), 438 deletions(-) William, Thanks for this new series. There are problems in the 5th patch. When a server-state file is loaded, if health-check or agent-check are disabled for a server, a warning is always emitted, independently on the parameters. Especially if the parameters are both unset (NULL or "-" for addresses and NULL and "0" for port). The format of this warning is also not really convenient. While the warning emitted on the CLI is good, when the server-state file is loaded, we have such warning : [WARNING] 040/150958 (3623566) : server-state application failed for server 'back-http/www' health checks are not enabled on this server. agent checks are not enabled on this server. I suppose it also explains your last patch ("CLEANUP: server: add missing space in server-state error output"). In addition, the warning is a bit misleading because most of the server state is loaded at this stage. Note this comment is also true for the warnings about the SRV resolution. I guess we should instead emit a different warning depending the server state is not loaded at all or partially loaded with the list of skipped part. But at least if the warning is well formatted, it should be good for this patch. Having 2 types of warnings may be handled in another patch. Finally, there is a more annoying bug, but pretty easy to fix. You must be careful to not use chunk_appendf() with a variable format string (2nd arg). Just like with the printf family functions, you must not call it with a non-constant format string. Here, there is a problem if I manually update the server-state file to set "%s%s%s%s%s%s" as health-check or agent-check address. Thus replace this line : chunk_appendf(msg, warning); by chunk_appendf(msg, "%s", warning); or better : chunk_strcat(msg, warning); To conclude, I may merge the first 4 patches if you want, but I guess you will have to adapt the first ones to produce better warnings. So I will wait your confirmation to proceed. However, I will merge the bug fix. There is no reason to not take it. -- Christopher Faulet
[PATCH] cleanup: remove unused variable assignment
Hello, this is pure cleanup. should fix #1048 Ilya From 350c37117369607928ddb232cf8f86b1c218bd47 Mon Sep 17 00:00:00 2001 From: Ilya Shipitsin Date: Wed, 10 Feb 2021 13:03:30 +0500 Subject: [PATCH] CLEANUP: remove unused variable assigned found by Coverity this is pure cleanup, no need to backport 2116if ((end - 1) == (payload + strlen(PAYLOAD_PATTERN))) { 2117/* if the payload pattern is at the end */ 2118s->pcli_flags |= PCLI_F_PAYLOAD; CID 1399833 (#1 of 1): Unused value (UNUSED_VALUE)assigned_value: Assigning value from reql to ret here, but that stored value is overwritten before it can be used. 2119ret = reql; 2120} --- src/cli.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/cli.c b/src/cli.c index bfce9a60b..8b32e7610 100644 --- a/src/cli.c +++ b/src/cli.c @@ -2184,7 +2184,6 @@ int pcli_parse_request(struct stream *s, struct channel *req, char **errmsg, int if ((end - 1) == (payload + strlen(PAYLOAD_PATTERN))) { /* if the payload pattern is at the end */ s->pcli_flags |= PCLI_F_PAYLOAD; - ret = reql; } *(end-1) = '\n'; -- 2.29.2