Re: [PATCH] BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl
Hello Christopher, Thanks for your research, On Tue, Jan 11, 2022 at 6:55 PM Christopher Faulet wrote: > Le 1/10/22 à 23:19, Willy Tarreau a écrit : > > At this point I'm starting to think that we should probably avoid as > > much as possible to use implicit settings for whatever is dynamic. > > Originally a lot of settings were implicit because we don't want to > > have huge config lines to enforce lots of obvious settings. On the CLI > > it's less of a problem and for example if "ssl" only deals with the > > traffic without manipulating the checks, I'm personally not shocked, > > because these are normally sent by bots and we don't care if they > > have to set a few more settings to avoid multiple implicit changes > > that may not always be desired. This is where the CLI (or any future > > API) might differ a bit from the config, and an agent writing some > > config might have to explicitly state certain things like "no-check-ssl" > > for example to stay safe and avoid such implicit dependencies. > > > > I agree with Willy on this point. Especially because, depending the order of > commands, the result can be different because of implicit changes and it is > pretty hard to predict how it will behave in all cases. yup totally agree with both of you. I can't really remember why I did that in the first place. > For instance, to fix William's bug about "set server ssl" command, in a way or > another, we must stop to dynamically update the health-check if its port or > its > address is explicitly specified. With this change, the result of following set > of commands will be different: > >$ set server BK/SRV ssl on >$ set server BK/SRV check-port XXX > > ==> SSL is enabled for the server and the health-check interesting one :)) >$ set server BK/SRV check-port XXX >$ set server BK/SRV ssl on > > ==> because the check's port was updated first, the SSL is only enabled for > the > server. > Agree. But, if possible, a warning may be added in the documentation to warn > about implicit changes. >From the discussion, I would be tempted to say the opposite, as I feel like keeping implicit things for this command is worse. > About the fix, I checked the code, and, at first glance, there is no reason to > change "srv->check.use_ssl" value when the health-check uses the server > settings. Thus, we may fix "set server ssl" command this way: > > diff --git a/src/check.c b/src/check.c > index f0ae81504..8cf8a1c5b 100644 > --- a/src/check.c > +++ b/src/check.c > @@ -1565,10 +1565,8 @@ int init_srv_check(struct server *srv) > * default, unless one is specified. > */ > if (!srv->check.port && !is_addr(>check.addr)) { > - if (!srv->check.use_ssl && srv->use_ssl != -1) { > - srv->check.use_ssl = srv->use_ssl; > - srv->check.xprt= srv->xprt; > - } > + if (!srv->check.use_ssl && srv->use_ssl != -1) > + srv->check.xprt = srv->xprt; > else if (srv->check.use_ssl == 1) > srv->check.xprt = xprt_get(XPRT_SSL); > srv->check.send_proxy |= (srv->pp_opts); indeed this is simplified that way > diff --git a/src/server.c b/src/server.c > index 2061153bc..a18836a71 100644 > --- a/src/server.c > +++ b/src/server.c > @@ -2113,10 +2113,11 @@ void srv_set_ssl(struct server *s, int use_ssl) > return; > > s->use_ssl = use_ssl; > - if (s->use_ssl) > - s->xprt = xprt_get(XPRT_SSL); > - else > - s->xprt = s->check.xprt = s->agent.xprt = xprt_get(XPRT_RAW); > + s->xprt = (s->use_ssl == 1) ? xprt_get(XPRT_SSL) : xprt_get(XPRT_RAW); > + > + if ((s->check.state & CHK_ST_CONFIGURED) && !s->check.use_ssl && > + !s->check.port && !is_addr(>check.addr)) > + s->check.xprt = s->xprt; > } as mentioned above I am not sure I am aligned here; I would rather remove the side effect of changing s->check. > #endif /* USE_OPENSSL */ > > This may be done with the following 3 steps: > >* First, stop to change "srv->check.use_ssl" value > >* Then, stop to change the agent settings in srv_set_ssl() because there is > no ssl support for agent-check. good point, I did not know >* Finally, fix the bug identified by William, adding the according > documentation. > > Note I don't know if all this stuff works properly with the server-state > file... I am always scared to look at the server state functionality... -- William
[ANNOUNCE] haproxy-2.3.17
Hi, HAProxy 2.3.17 was released on 2022/01/11. It added 51 new commits after version 2.3.16. This is essentially an update of 2.3 with the relevant fixes that were collected from 2.4.9 to 2.4.12. The list below was mostly composed from the last few 2.4 changelogs: - using multiple log-forward sections would crash after parsing the config, that's now fixed. - possible crash on master CLI when trying to enter an old pid when in prompt mode - yet another risk of crash on resolvers was fixed, this time when getting a response error, because some invalid elements could be left in the list. - the problem where the SNI could be set again on a reused server connection was fixed - a workaround for a possibly slow malloc_trim() in modern libcs upon reload when using many threads, that could be slow enough to panic the old process. - there was a risk of frozen stream or spinning loop when combining layer-7 retries with some filters because an analyser responsible for releasing the filter was dropped. This was fixed. - eliminate a rare risk of deadlock when built with DEBUG_UAF. It would only affect developers chasing some user-after-free bugs, but better fix it anyway. - on reload we used to transfer listening sockets by packs of 253 between the old and the new process but it looks like for whatever reason on musl 253 doesn't work and the limit is 252. It might be caused by a slightly different layout for the message. So the limit was lowered by one as this will definitely not affect reload time! - Daniel Jakots fixed the build with libreSSL 3.5 and newer (some macros didn't work anymore). - David Carlier fixed the build with FreeBSD 14, which changes the cpuset API to better match Linux's. - the build warning with clang on i386 was addressed - fixed some usual "maybe unused" warnings on old compilers for unusual platform (gcc-4.7 on MIPS with threads disabled). - William fixed a bug in the master-worker when the master is executed in wait mode (only after a reload failure in 2.3). In this case, the master must never try to to get the listeners FD from the previous process using _getsocks on the stats socket. Otherwise, if a reload fails, the master exists with a EXIT_FAILURE status, killing all the workers. - the CLI's "show version" was backported to help with diagnostics and to uniformize APIs between versions. - various minor doc updates and typo fixes - some regtest and CI backports to ease stable maintainers' job Please find the usual URLs below : Site index : http://www.haproxy.org/ Discourse: http://discourse.haproxy.org/ Slack channel: https://slack.haproxy.org/ Issue tracker: https://github.com/haproxy/haproxy/issues Wiki : https://github.com/haproxy/wiki/wiki Sources : http://www.haproxy.org/download/2.3/src/ Git repository : http://git.haproxy.org/git/haproxy-2.3.git/ Git Web browsing : http://git.haproxy.org/?p=haproxy-2.3.git Changelog: http://www.haproxy.org/download/2.3/src/CHANGELOG Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ Willy --- Complete changelog : Amaury Denoyelle (2): BUG/MINOR: backend: do not set sni on connection reuse BUG/MINOR: backend: restore the SF_SRV_REUSED flag original purpose Christopher Faulet (5): BUG/MEDIUM: cli: Properly set stream analyzers to process one command at a time BUG/MEDIUM: resolvers: Detach query item on response error DOC: spoe: Clarify use of the event directive in spoe-message section DOC: config: Specify %Ta is only available in HTTP mode BUG/MEDIUM: http-ana: Preserve response's FLT_END analyser on L7 retry Daniel Jakots (1): BUILD: ssl: unbreak the build with newer libressl David Carlier (1): BUILD/MINOR: tools: solaris build fix on dladdr. Emeric Brun (1): BUG/MAJOR: segfault using multiple log forward sections. Ilya Shipitsin (11): CI: Github Actions: enable prometheus exporter CI: Github Actions: remove LibreSSL-3.0.2 builds CI: Github Actions: enable BoringSSL builds CI: Github Action: run "apt-get update" before packages restore CI: github actions: update LibreSSL to 3.3.0 CI: github actions: enable 51degrees feature CI: GitHub Actions: enable daily Coverity scan CI: github actions: build several popular "contrib" tools CI: github actions: switch to stable LibreSSL release CI: github actions: update LibreSSL to 3.2.5 CI: Github Actions: switch to LibreSSL-3.3.3 Lukas Tribus (1): DOC: config: retry-on list is space-delimited Thierry Fournier (1): DOC: fix misspelled keyword "resolve_retries" in resolvers Tim Duesterhus (12): CI: Expand use of GitHub Actions for CI CI: Stop hijacking the hosts file CI: Make the h2spec workflow more consistent with the VTest workflow CI: Pass the
Re: [PATCH] BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl
Le 1/10/22 à 23:19, Willy Tarreau a écrit : At this point I'm starting to think that we should probably avoid as much as possible to use implicit settings for whatever is dynamic. Originally a lot of settings were implicit because we don't want to have huge config lines to enforce lots of obvious settings. On the CLI it's less of a problem and for example if "ssl" only deals with the traffic without manipulating the checks, I'm personally not shocked, because these are normally sent by bots and we don't care if they have to set a few more settings to avoid multiple implicit changes that may not always be desired. This is where the CLI (or any future API) might differ a bit from the config, and an agent writing some config might have to explicitly state certain things like "no-check-ssl" for example to stay safe and avoid such implicit dependencies. I agree with Willy on this point. Especially because, depending the order of commands, the result can be different because of implicit changes and it is pretty hard to predict how it will behave in all cases. For instance, to fix William's bug about "set server ssl" command, in a way or another, we must stop to dynamically update the health-check if its port or its address is explicitly specified. With this change, the result of following set of commands will be different: $ set server BK/SRV ssl on $ set server BK/SRV check-port XXX ==> SSL is enabled for the server and the health-check $ set server BK/SRV check-port XXX $ set server BK/SRV ssl on ==> because the check's port was updated first, the SSL is only enabled for the server. Note that such a brainstorming doesn't apply to your fix and should not hold it from being merged in any way, I'm just speaking in the general sense, as I don't feel comfortable with keep all these special cases in a newly introduced API, that are only there for historical reasons. Agree. But, if possible, a warning may be added in the documentation to warn about implicit changes. About the fix, I checked the code, and, at first glance, there is no reason to change "srv->check.use_ssl" value when the health-check uses the server settings. Thus, we may fix "set server ssl" command this way: diff --git a/src/check.c b/src/check.c index f0ae81504..8cf8a1c5b 100644 --- a/src/check.c +++ b/src/check.c @@ -1565,10 +1565,8 @@ int init_srv_check(struct server *srv) * default, unless one is specified. */ if (!srv->check.port && !is_addr(>check.addr)) { - if (!srv->check.use_ssl && srv->use_ssl != -1) { - srv->check.use_ssl = srv->use_ssl; - srv->check.xprt= srv->xprt; - } + if (!srv->check.use_ssl && srv->use_ssl != -1) + srv->check.xprt = srv->xprt; else if (srv->check.use_ssl == 1) srv->check.xprt = xprt_get(XPRT_SSL); srv->check.send_proxy |= (srv->pp_opts); diff --git a/src/server.c b/src/server.c index 2061153bc..a18836a71 100644 --- a/src/server.c +++ b/src/server.c @@ -2113,10 +2113,11 @@ void srv_set_ssl(struct server *s, int use_ssl) return; s->use_ssl = use_ssl; - if (s->use_ssl) - s->xprt = xprt_get(XPRT_SSL); - else - s->xprt = s->check.xprt = s->agent.xprt = xprt_get(XPRT_RAW); + s->xprt = (s->use_ssl == 1) ? xprt_get(XPRT_SSL) : xprt_get(XPRT_RAW); + + if ((s->check.state & CHK_ST_CONFIGURED) && !s->check.use_ssl && + !s->check.port && !is_addr(>check.addr)) + s->check.xprt = s->xprt; } #endif /* USE_OPENSSL */ This may be done with the following 3 steps: * First, stop to change "srv->check.use_ssl" value * Then, stop to change the agent settings in srv_set_ssl() because there is no ssl support for agent-check. * Finally, fix the bug identified by William, adding the according documentation. Note I don't know if all this stuff works properly with the server-state file... -- Christopher Faulet
[ANNOUNCE] haproxy-2.5.1
Hi, HAProxy 2.5.1 was released on 2022/01/11. It added 58 new commits after version 2.5.0. As usual, this release fixes several issues and brings some improvements: - there was a possible slow memory leak of struct sockaddr during layer-7 retries that would end up with a redispatch. We're speaking about ~200 bytes per retried request, which normally doesn't harm, but can at least fool some monitoring and cause some concerns. - there was another memory leak in jwt_header_query and jwt_payload_query converters. A full buffer was leaking at each invocation (~16k by default). - there was a risk of frozen stream or spinning loop when combining layer-7 retries with some filters because an analyzer responsible for releasing the filter was dropped. This was fixed. - there was an allocation problem when SSL was configured using a "default-server" directive. Some SSL settings like "crt" or possibly "ca" as well were causing an SSL_CTX to be allocated too early (at the moment the directive was parsed) and replicated for each server inheriting it. But that led to problems when these fields were updated at runtime for a given server as that could affect other servers' as well. And during soft-stop it would cause double-free issues as reported in github issue 1488. - William found that a number of free() were missing for server SSL settings when deleting a server. That's not dramatic but it could definitely be noticeable by those adding/removing servers often. - splicing of HTTP/1.1 responses would always incorrectly end up in closing the client connection at the end of the transfer, and was simply disabled for messages of unknown lengths (neither content-length nor transfer-encoding). Fixing these issues, an inter-release regression was introduced. This was fixed and at the end, no 2.5 release is impacted. Only the 2.4.11 is concerned by this regression. - there was an issue with the HTTP/1.1 chunk-encoded message parsing, when the message headers were not received in one time. In this case, a parsing error could be erroneously reported. This was fixed. - using multiple log-forward sections would crash after parsing the config, that's now fixed. - a possible crash on master CLI when trying to enter an old pid when in prompt mode. - there was a FD leak on the eventpoll in the master-worker in wait mode. This was fixed. In addition, in this mode, on reload, the master was trying to get the listeners FD from the previous process using _getsocks on the stat socket. It was not necessary. And if the reload failed, it led the master to exit with EXIT_FAILURE status, killing the workers. This was fixed by restricting the use of _getsocks to some modes. - yet another risk of crash on resolvers was fixed, this time when getting a response error, because some invalid elements could be left in the list. - a possible crash when adding a server on the CLI with a custom ID because the key was not properly initialized. - there was an issue with "show cache" command. Each entry was reported several times because of a bug in the loop walking through the cache. - The http-client was systematically adding a Host header using the provided URL while it is possible to pass it in the header list. Now if it is found in the header list, no other Host header is added. - Client SNI was not saved in case of ClientHello error, making strict-sni related errors hard to debug. This was fixed. - conn_cur was not properly ignored from incoming peer messages, especially when frequency counters or arrays are exchanged after conn_cur. The stream was desynchronized and incorrect values were read. - Some server-side SSL settings could be lost when using more than one default-server directive. - since 2.4 during a soft-stop we're closing all idle frontend connections so that we don't have to wait for clients to time out nor for them to send a new request. But it turns out that doing it as any server would do it disturbs AWS' ALB, which immediately emits a 502 to their clients after failing to upload a new request on such a closed connection. It's well known (and documented) that reused connections have a window of uncertainty and that an agent must retry on them (which is why haproxy usually silently closes with the client when it experiences this so that the client can decide to retry). Thus ALB's behavior is incorrect and prevents from using keep-alive normally with the next hop. What was done here was to add an option "idle-close-on-response" to reintroduce the old behavior and wait for clients to speak first before closing. Credits go to William Dauchy for the report and the work around. - Daniel Jakots fixed the build with libreSSL 3.5 and newer (some macros didn't work anymore). - David Carlier fixed the
Re: [PATCH] BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl
Le 1/10/22 à 23:19, Willy Tarreau a écrit : w options were still configurable on the CLI by then. "check-ssl" has been available for a long time, so that's not the reason behind it, but I guess you were referring to something else. I suspect I did a dumb copy/paste from the new_server function and probably never thought was possibly wrong as my previous production never had any check using tls. Maybe but then I don't remember about all the detailed rules in place that indicate when check-ssl *has* to be used. I'll have to read the doc. For a health-check, if no port or address is specified and no transport layer is forced, then the transport layer used by the check is the same than for the production traffic. So, the same must be done for dynamic changes. But it is not so simple because, when the check inherits from the server settings, "srv->check.use_ssl" is also changed. I don't remember why this field is updated. But this may prevent any dynamic change on healtcheck. I must read the code to be sure. -- Christopher Faulet
Re: [PATCH] CI: cleanup default step condition
Hi Ilya, > this is cleanup patch that removes default (non needed) step condition. > behavior is not changed. Now applied, thank you. Willy
Re: [ANNOUNCE] haproxy-2.4.12
Hi! Less than 24 hours between the issue opening and the fix? :-O Great job. Really. .marcoc
[ANNOUNCE] haproxy-2.4.12
Hi, HAProxy 2.4.12 was released on 2022/01/11. It added 2 new commits after version 2.4.11. The 2.4.11 introduced a major regression into the H1 multiplexer. The bug affects all HTTP messages announcing a content length from the time there are some contentions on the output buffer. The result is that an internal error is erroneously reported when HAproxy tries to emit the last block of payload and the message is unexpectedly truncated. In addition, William fixed a bug in the master-worker when the master is executed in wait mode. In this case, the master must never try to to get the listeners FD from the previous process using _getsocks on the stat socket. Otherwise, if a reload fails, the master exists with a EXIT_FAILURE status, killing all the workers. A 2.5.1 will be emitted very soon with these fixes (and many others). However the 2.5.0 is not affected by the first issue. The 2.4.12 was emitted first to avoid any trouble for anyone who would like to update or who have already upgraded. Please find the usual URLs below : Site index : http://www.haproxy.org/ Discourse: http://discourse.haproxy.org/ Slack channel: https://slack.haproxy.org/ Issue tracker: https://github.com/haproxy/haproxy/issues Wiki : https://github.com/haproxy/wiki/wiki Sources : http://www.haproxy.org/download/2.4/src/ Git repository : http://git.haproxy.org/git/haproxy-2.4.git/ Git Web browsing : http://git.haproxy.org/?p=haproxy-2.4.git Changelog: http://www.haproxy.org/download/2.4/src/CHANGELOG Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ --- Complete changelog : Christopher Faulet (1): BUG/MAJOR: mux-h1: Don't decrement .curr_len for unsent data William Lallemand (1): BUG/MEDIUM: mworker: don't use _getsocks in wait mode -- Christopher Faulet