Re: [PATCH] BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl

2022-01-11 Thread William Dauchy
Hello Christopher,

Thanks for your research,

On Tue, Jan 11, 2022 at 6:55 PM Christopher Faulet  wrote:
> Le 1/10/22 à 23:19, Willy Tarreau a écrit :
> > At this point I'm starting to think that we should probably avoid as
> > much as possible to use implicit settings for whatever is dynamic.
> > Originally a lot of settings were implicit because we don't want to
> > have huge config lines to enforce lots of obvious settings. On the CLI
> > it's less of a problem and for example if "ssl" only deals with the
> > traffic without manipulating the checks, I'm personally not shocked,
> > because these are normally sent by bots and we don't care if they
> > have to set a few more settings to avoid multiple implicit changes
> > that may not always be desired. This is where the CLI (or any future
> > API) might differ a bit from the config, and an agent writing some
> > config might have to explicitly state certain things like "no-check-ssl"
> > for example to stay safe and avoid such implicit dependencies.
> >
>
> I agree with Willy on this point. Especially because, depending the order of
> commands, the result can be different because of implicit changes and it is
> pretty hard to predict how it will behave in all cases.

yup totally agree with both of you. I can't really remember why I did
that in the first place.

> For instance, to fix William's bug about "set server ssl" command, in a way or
> another, we must stop to dynamically update the health-check if its port or 
> its
> address is explicitly specified. With this change, the result of following set
> of commands will be different:
>
>$ set server BK/SRV ssl on
>$ set server BK/SRV check-port XXX
>
> ==> SSL is enabled for the server and the health-check

interesting one :))

>$ set server BK/SRV check-port XXX
>$ set server BK/SRV ssl on
>
> ==> because the check's port was updated first, the SSL is only enabled for 
> the
> server.

> Agree. But, if possible, a warning may be added in the documentation to warn
> about implicit changes.

>From the discussion, I would be tempted to say the opposite, as I feel
like keeping implicit things for this command is worse.

> About the fix, I checked the code, and, at first glance, there is no reason to
> change "srv->check.use_ssl" value when the health-check uses the server
> settings. Thus, we may fix "set server ssl" command this way:
>
> diff --git a/src/check.c b/src/check.c
> index f0ae81504..8cf8a1c5b 100644
> --- a/src/check.c
> +++ b/src/check.c
> @@ -1565,10 +1565,8 @@ int init_srv_check(struct server *srv)
>   * default, unless one is specified.
>   */
>  if (!srv->check.port && !is_addr(>check.addr)) {
> -   if (!srv->check.use_ssl && srv->use_ssl != -1) {
> -   srv->check.use_ssl = srv->use_ssl;
> -   srv->check.xprt= srv->xprt;
> -   }
> +   if (!srv->check.use_ssl && srv->use_ssl != -1)
> +   srv->check.xprt = srv->xprt;
>  else if (srv->check.use_ssl == 1)
>  srv->check.xprt = xprt_get(XPRT_SSL);
>  srv->check.send_proxy |= (srv->pp_opts);

indeed this is simplified that way

> diff --git a/src/server.c b/src/server.c
> index 2061153bc..a18836a71 100644
> --- a/src/server.c
> +++ b/src/server.c
> @@ -2113,10 +2113,11 @@ void srv_set_ssl(struct server *s, int use_ssl)
>  return;
>
>  s->use_ssl = use_ssl;
> -   if (s->use_ssl)
> -   s->xprt = xprt_get(XPRT_SSL);
> -   else
> -   s->xprt = s->check.xprt = s->agent.xprt = xprt_get(XPRT_RAW);
> +   s->xprt = (s->use_ssl == 1) ? xprt_get(XPRT_SSL) : xprt_get(XPRT_RAW);
> +
> +   if ((s->check.state & CHK_ST_CONFIGURED) && !s->check.use_ssl &&
> +   !s->check.port && !is_addr(>check.addr))
> +   s->check.xprt = s->xprt;
>   }

as mentioned above I am not sure I am aligned here; I would rather
remove the side effect of changing s->check.

>   #endif /* USE_OPENSSL */
>
> This may be done with the following 3 steps:
>
>* First, stop to change "srv->check.use_ssl" value
>
>* Then, stop to change the agent settings in srv_set_ssl() because there is
> no ssl support for agent-check.

good point, I did not know

>* Finally, fix the bug identified by William, adding the according 
> documentation.
>
> Note I don't know if all this stuff works properly with the server-state 
> file...

I am always scared to look at the server state functionality...

-- 
William



[ANNOUNCE] haproxy-2.3.17

2022-01-11 Thread Willy Tarreau
Hi,

HAProxy 2.3.17 was released on 2022/01/11. It added 51 new commits
after version 2.3.16.

This is essentially an update of 2.3 with the relevant fixes that were
collected from 2.4.9 to 2.4.12. The list below was mostly composed from
the last few 2.4 changelogs:

 - using multiple log-forward sections would crash after parsing the
   config, that's now fixed.

 - possible crash on master CLI when trying to enter an old pid when
   in prompt mode

 - yet another risk of crash on resolvers was fixed, this time when
   getting a response error, because some invalid elements could be
   left in the list.

 - the problem where the SNI could be set again on a reused server
   connection was fixed

 - a workaround for a possibly slow malloc_trim() in modern libcs upon
   reload when using many threads, that could be slow enough to panic
   the old process.

 - there was a risk of frozen stream or spinning loop when combining
   layer-7 retries with some filters because an analyser responsible
   for releasing the filter was dropped. This was fixed.

 - eliminate a rare risk of deadlock when built with DEBUG_UAF. It
   would only affect developers chasing some user-after-free bugs,
   but better fix it anyway.

 - on reload we used to transfer listening sockets by packs of 253 between
   the old and the new process but it looks like for whatever reason on
   musl 253 doesn't work and the limit is 252. It might be caused by a
   slightly different layout for the message. So the limit was lowered by
   one as this will definitely not affect reload time!

 - Daniel Jakots fixed the build with libreSSL 3.5 and newer (some macros
   didn't work anymore).

 - David Carlier fixed the build with FreeBSD 14, which changes the cpuset
   API to better match Linux's.

 - the build warning with clang on i386 was addressed

 - fixed some usual "maybe unused" warnings on old compilers for
   unusual platform (gcc-4.7 on MIPS with threads disabled).

 - William fixed a bug in the master-worker when the master is executed in
   wait mode (only after a reload failure in 2.3). In this case, the master
   must never try to to get the listeners FD from the previous process
   using _getsocks on the stats socket. Otherwise, if a reload fails, the
   master exists with a EXIT_FAILURE status, killing all the workers.

 - the CLI's "show version" was backported to help with diagnostics and
   to uniformize APIs between versions.

 - various minor doc updates and typo fixes

 - some regtest and CI backports to ease stable maintainers' job

Please find the usual URLs below :
   Site index   : http://www.haproxy.org/
   Discourse: http://discourse.haproxy.org/
   Slack channel: https://slack.haproxy.org/
   Issue tracker: https://github.com/haproxy/haproxy/issues
   Wiki : https://github.com/haproxy/wiki/wiki
   Sources  : http://www.haproxy.org/download/2.3/src/
   Git repository   : http://git.haproxy.org/git/haproxy-2.3.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy-2.3.git
   Changelog: http://www.haproxy.org/download/2.3/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/

Willy
---
Complete changelog :
Amaury Denoyelle (2):
  BUG/MINOR: backend: do not set sni on connection reuse
  BUG/MINOR: backend: restore the SF_SRV_REUSED flag original purpose

Christopher Faulet (5):
  BUG/MEDIUM: cli: Properly set stream analyzers to process one command at 
a time
  BUG/MEDIUM: resolvers: Detach query item on response error
  DOC: spoe: Clarify use of the event directive in spoe-message section
  DOC: config: Specify %Ta is only available in HTTP mode
  BUG/MEDIUM: http-ana: Preserve response's FLT_END analyser on L7 retry

Daniel Jakots (1):
  BUILD: ssl: unbreak the build with newer libressl

David Carlier (1):
  BUILD/MINOR: tools: solaris build fix on dladdr.

Emeric Brun (1):
  BUG/MAJOR: segfault using multiple log forward sections.

Ilya Shipitsin (11):
  CI: Github Actions: enable prometheus exporter
  CI: Github Actions: remove LibreSSL-3.0.2 builds
  CI: Github Actions: enable BoringSSL builds
  CI: Github Action: run "apt-get update" before packages restore
  CI: github actions: update LibreSSL to 3.3.0
  CI: github actions: enable 51degrees feature
  CI: GitHub Actions: enable daily Coverity scan
  CI: github actions: build several popular "contrib" tools
  CI: github actions: switch to stable LibreSSL release
  CI: github actions: update LibreSSL to 3.2.5
  CI: Github Actions: switch to LibreSSL-3.3.3

Lukas Tribus (1):
  DOC: config: retry-on list is space-delimited

Thierry Fournier (1):
  DOC: fix misspelled keyword "resolve_retries" in resolvers

Tim Duesterhus (12):
  CI: Expand use of GitHub Actions for CI
  CI: Stop hijacking the hosts file
  CI: Make the h2spec workflow more consistent with the VTest workflow
  CI: Pass the 

Re: [PATCH] BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl

2022-01-11 Thread Christopher Faulet

Le 1/10/22 à 23:19, Willy Tarreau a écrit :

At this point I'm starting to think that we should probably avoid as
much as possible to use implicit settings for whatever is dynamic.
Originally a lot of settings were implicit because we don't want to
have huge config lines to enforce lots of obvious settings. On the CLI
it's less of a problem and for example if "ssl" only deals with the
traffic without manipulating the checks, I'm personally not shocked,
because these are normally sent by bots and we don't care if they
have to set a few more settings to avoid multiple implicit changes
that may not always be desired. This is where the CLI (or any future
API) might differ a bit from the config, and an agent writing some
config might have to explicitly state certain things like "no-check-ssl"
for example to stay safe and avoid such implicit dependencies.



I agree with Willy on this point. Especially because, depending the order of 
commands, the result can be different because of implicit changes and it is 
pretty hard to predict how it will behave in all cases.


For instance, to fix William's bug about "set server ssl" command, in a way or 
another, we must stop to dynamically update the health-check if its port or its 
address is explicitly specified. With this change, the result of following set 
of commands will be different:


  $ set server BK/SRV ssl on
  $ set server BK/SRV check-port XXX

==> SSL is enabled for the server and the health-check

  $ set server BK/SRV check-port XXX
  $ set server BK/SRV ssl on

==> because the check's port was updated first, the SSL is only enabled for the 
server.



Note that such a brainstorming doesn't apply to your fix and should
not hold it from being merged in any way, I'm just speaking in the
general sense, as I don't feel comfortable with keep all these special
cases in a newly introduced API, that are only there for historical
reasons.


Agree. But, if possible, a warning may be added in the documentation to warn 
about implicit changes.


About the fix, I checked the code, and, at first glance, there is no reason to 
change "srv->check.use_ssl" value when the health-check uses the server 
settings. Thus, we may fix "set server ssl" command this way:


diff --git a/src/check.c b/src/check.c
index f0ae81504..8cf8a1c5b 100644
--- a/src/check.c
+++ b/src/check.c
@@ -1565,10 +1565,8 @@ int init_srv_check(struct server *srv)
 * default, unless one is specified.
 */
if (!srv->check.port && !is_addr(>check.addr)) {
-   if (!srv->check.use_ssl && srv->use_ssl != -1) {
-   srv->check.use_ssl = srv->use_ssl;
-   srv->check.xprt= srv->xprt;
-   }
+   if (!srv->check.use_ssl && srv->use_ssl != -1)
+   srv->check.xprt = srv->xprt;
else if (srv->check.use_ssl == 1)
srv->check.xprt = xprt_get(XPRT_SSL);
srv->check.send_proxy |= (srv->pp_opts);

diff --git a/src/server.c b/src/server.c
index 2061153bc..a18836a71 100644
--- a/src/server.c
+++ b/src/server.c
@@ -2113,10 +2113,11 @@ void srv_set_ssl(struct server *s, int use_ssl)
return;

s->use_ssl = use_ssl;
-   if (s->use_ssl)
-   s->xprt = xprt_get(XPRT_SSL);
-   else
-   s->xprt = s->check.xprt = s->agent.xprt = xprt_get(XPRT_RAW);
+   s->xprt = (s->use_ssl == 1) ? xprt_get(XPRT_SSL) : xprt_get(XPRT_RAW);
+
+   if ((s->check.state & CHK_ST_CONFIGURED) && !s->check.use_ssl &&
+   !s->check.port && !is_addr(>check.addr))
+   s->check.xprt = s->xprt;
 }

 #endif /* USE_OPENSSL */

This may be done with the following 3 steps:

  * First, stop to change "srv->check.use_ssl" value

  * Then, stop to change the agent settings in srv_set_ssl() because there is 
no ssl support for agent-check.


  * Finally, fix the bug identified by William, adding the according 
documentation.

Note I don't know if all this stuff works properly with the server-state file...

--
Christopher Faulet



[ANNOUNCE] haproxy-2.5.1

2022-01-11 Thread Christopher Faulet

Hi,

HAProxy 2.5.1 was released on 2022/01/11. It added 58 new commits
after version 2.5.0.

As usual, this release fixes several issues and brings some improvements:

 - there was a possible slow memory leak of struct sockaddr during layer-7
   retries that would end up with a redispatch. We're speaking about ~200
   bytes per retried request, which normally doesn't harm, but can at least
   fool some monitoring and cause some concerns.

 - there was another memory leak in jwt_header_query and jwt_payload_query
   converters. A full buffer was leaking at each invocation (~16k by
   default).

 - there was a risk of frozen stream or spinning loop when combining
   layer-7 retries with some filters because an analyzer responsible
   for releasing the filter was dropped. This was fixed.

 - there was an allocation problem when SSL was configured using a
   "default-server" directive. Some SSL settings like "crt" or
   possibly "ca" as well were causing an SSL_CTX to be allocated too
   early (at the moment the directive was parsed) and replicated for
   each server inheriting it. But that led to problems when these
   fields were updated at runtime for a given server as that could
   affect other servers' as well. And during soft-stop it would cause
   double-free issues as reported in github issue 1488.

 - William found that a number of free() were missing for server SSL
   settings when deleting a server. That's not dramatic but it could
   definitely be noticeable by those adding/removing servers often.

 - splicing of HTTP/1.1 responses would always incorrectly end up in closing
   the client connection at the end of the transfer, and was simply disabled
   for messages of unknown lengths (neither content-length nor
   transfer-encoding). Fixing these issues, an inter-release regression was
   introduced. This was fixed and at the end, no 2.5 release is impacted.
   Only the 2.4.11 is concerned by this regression.

  - there was an issue with the HTTP/1.1 chunk-encoded message parsing, when
the message headers were not received in one time. In this case, a
parsing error could be erroneously reported. This was fixed.

  - using multiple log-forward sections would crash after parsing the
config, that's now fixed.

  - a possible crash on master CLI when trying to enter an old pid when in
prompt mode.

  - there was a FD leak on the eventpoll in the master-worker in wait
mode. This was fixed. In addition, in this mode, on reload, the master
was trying to get the listeners FD from the previous process using
_getsocks on the stat socket. It was not necessary. And if the reload
failed, it led the master to exit with EXIT_FAILURE status, killing the
workers. This was fixed by restricting the use of _getsocks to some
modes.

  - yet another risk of crash on resolvers was fixed, this time when
getting a response error, because some invalid elements could be
left in the list.

  - a possible crash when adding a server on the CLI with a custom ID
because the key was not properly initialized.

  - there was an issue with "show cache" command. Each entry was reported
several times because of a bug in the loop walking through the cache.

  - The http-client was systematically adding a Host header using the
provided URL while it is possible to pass it in the header list. Now if
it is found in the header list, no other Host header is added.

  - Client SNI was not saved in case of ClientHello error, making strict-sni
related errors hard to debug. This was fixed.

  - conn_cur was not properly ignored from incoming peer messages,
especially when frequency counters or arrays are exchanged after
conn_cur. The stream was desynchronized and incorrect values were read.

  - Some server-side SSL settings could be lost when using more than one
default-server directive.

  - since 2.4 during a soft-stop we're closing all idle frontend connections
so that we don't have to wait for clients to time out nor for them to
send a new request. But it turns out that doing it as any server would
do it disturbs AWS' ALB, which immediately emits a 502 to their clients
after failing to upload a new request on such a closed connection. It's
well known (and documented) that reused connections have a window of
uncertainty and that an agent must retry on them (which is why haproxy
usually silently closes with the client when it experiences this so that
the client can decide to retry). Thus ALB's behavior is incorrect and
prevents from using keep-alive normally with the next hop. What was done
here was to add an option "idle-close-on-response" to reintroduce the
old behavior and wait for clients to speak first before closing.
Credits go to William Dauchy for the report and the work around.

  - Daniel Jakots fixed the build with libreSSL 3.5 and newer (some macros
didn't work anymore).

  - David Carlier fixed the 

Re: [PATCH] BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl

2022-01-11 Thread Christopher Faulet

Le 1/10/22 à 23:19, Willy Tarreau a écrit :
w options were still configurable on the CLI by then.



"check-ssl" has been available for a long time, so that's not the
reason behind it, but I guess you were referring to something else. I
suspect I did a dumb copy/paste from the new_server function and
probably never thought was possibly wrong as my previous production
never had any check using tls.


Maybe but then I don't remember about all the detailed rules in place
that indicate when check-ssl *has* to be used. I'll have to read the
doc.



For a health-check, if no port or address is specified and no transport layer is 
forced, then the transport layer used by the check is the same than for the 
production traffic.


So, the same must be done for dynamic changes. But it is not so simple because, 
when the check inherits from the server settings, "srv->check.use_ssl" is also 
changed. I don't remember why this field is updated. But this may prevent any 
dynamic change on healtcheck. I must read the code to be sure.


--
Christopher Faulet



Re: [PATCH] CI: cleanup default step condition

2022-01-11 Thread Willy Tarreau
Hi Ilya,

> this is cleanup patch that removes default (non needed) step condition.
> behavior is not changed.

Now applied, thank you.
Willy



Re: [ANNOUNCE] haproxy-2.4.12

2022-01-11 Thread Marco Corte

Hi!

Less than 24 hours between the issue opening and the fix? :-O
Great job. Really.

.marcoc



[ANNOUNCE] haproxy-2.4.12

2022-01-11 Thread Christopher Faulet

Hi,

HAProxy 2.4.12 was released on 2022/01/11. It added 2 new commits
after version 2.4.11.

The 2.4.11 introduced a major regression into the H1 multiplexer. The bug
affects all HTTP messages announcing a content length from the time there
are some contentions on the output buffer. The result is that an internal
error is erroneously reported when HAproxy tries to emit the last block of
payload and the message is unexpectedly truncated.

In addition, William fixed a bug in the master-worker when the master is
executed in wait mode. In this case, the master must never try to to get the
listeners FD from the previous process using _getsocks on the stat
socket. Otherwise, if a reload fails, the master exists with a EXIT_FAILURE
status, killing all the workers.

A 2.5.1 will be emitted very soon with these fixes (and many others).
However the 2.5.0 is not affected by the first issue. The 2.4.12 was emitted
first to avoid any trouble for anyone who would like to update or who have
already upgraded.

Please find the usual URLs below :
   Site index   : http://www.haproxy.org/
   Discourse: http://discourse.haproxy.org/
   Slack channel: https://slack.haproxy.org/
   Issue tracker: https://github.com/haproxy/haproxy/issues
   Wiki : https://github.com/haproxy/wiki/wiki
   Sources  : http://www.haproxy.org/download/2.4/src/
   Git repository   : http://git.haproxy.org/git/haproxy-2.4.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy-2.4.git
   Changelog: http://www.haproxy.org/download/2.4/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/


---
Complete changelog :
Christopher Faulet (1):
  BUG/MAJOR: mux-h1: Don't decrement .curr_len for unsent data

William Lallemand (1):
  BUG/MEDIUM: mworker: don't use _getsocks in wait mode

--
Christopher Faulet