Re: AW: acl using fc_dst_port not working

2023-03-10 Thread Aurelien DARRAGON
You're welcome!

The same bug was also found in the smp_fetch_dst_is_local() function
(fc_dst_is_local sample fetch).

Attached is the patch that addresses the 2 bugs, it will probably be
included in the next batch of releases.
2.5 and 2.7 are also affected

> Unfortunately, this option doesn't work in my context, because my logs
> show that it contains a different port (not the one I need)
Oh ok :/

> But I can wait for a release with the bug fixed, it's not urgent.
Thank you for you patience :)

Aurelien
From 69aaeeb8a54e60ddd66481f91354f0d99c75ec05 Mon Sep 17 00:00:00 2001
From: Aurelien DARRAGON 
Date: Fri, 10 Mar 2023 16:53:43 +0100
Subject: [PATCH] BUG/MINOR: tcp_sample: fix a bug in fc_dst_port and
 fc_dst_is_local sample fetches

There is a bug in the function smp_fetch_dport() which affects the 'f' case, also
known as 'fc_dst_port' sample fetch.

conn_get_src() is used to retrieve the address prior to calling conn_dst().
But this is wrong: conn_get_dst() should be used instead.

Because of that, conn_dst() may return unexpected results since the dst address
is not guaranteed to be set depending on the conn state at the time the sample
fetch is used.

This was reported by Corin Langosch on the ML:

during his tests he noticed that using fc_dst_port in a log-format string
resulted in the correct value being printed in the logs but when he used it
in an ACL, the ACL did not evaluate properly.

This can be easily reproduced with the following test conf:
|frontend test-http
|  bind 127.0.0.1:8080
|  mode http
|
|  acl test fc_dst_port eq 8080
|  http-request return status 200 if test
|  http-request return status 500 if !test

A request on 127.0.0.1:8080 should normally return 200 OK, but here it
will return a 500.

The same bug was also found in smp_fetch_dst_is_local() (fc_dst_is_local sample
fetch) by reading the code: the fix was applied twice.

This needs to be backported up to 2.5
[both sample fetches were introduced in 2.5 with 888cd70 ("MINOR: tcp-sample:
Add samples to get original info about client connection")]
---
 src/tcp_sample.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/tcp_sample.c b/src/tcp_sample.c
index 925b93291..12eb25c4e 100644
--- a/src/tcp_sample.c
+++ b/src/tcp_sample.c
@@ -176,7 +176,7 @@ int smp_fetch_dst_is_local(const struct arg *args, struct sample *smp, const cha
 	if (kw[0] == 'f') { /* fc_dst_is_local */
 		struct connection *conn = objt_conn(smp->sess->origin);
 
-		if (conn && conn_get_src(conn))
+		if (conn && conn_get_dst(conn))
 			dst = conn_dst(conn);
 	}
 	else /* dst_is_local */
@@ -232,10 +232,10 @@ smp_fetch_dport(const struct arg *args, struct sample *smp, const char *kw, void
 		if (conn && conn_get_dst(conn))
 			dst = conn_dst(conn);
 	}
-	else if (kw[0] == 'f') { /* fc_dst_post */
+	else if (kw[0] == 'f') { /* fc_dst_port */
 		struct connection *conn = objt_conn(smp->sess->origin);
 
-		if (conn && conn_get_src(conn))
+		if (conn && conn_get_dst(conn))
 			dst = conn_dst(conn);
 	}
 else /* dst_port */
-- 
2.34.1



AW: acl using fc_dst_port not working

2023-03-10 Thread Corin Langosch
Hello Aurelien,

thank you very much for your reply!
Indeed, there is a bug in the function smp_fetch_dport(): conn_get_src()
is used where conn_get_dst() should be used instead.
Thank you for fixing it! 
Thank you for telling us, I'm working on the patch
Meanwhile, maybe "dst_port" could work as a workaround depending on your
needs?
Unfortunately, this option doesn't work in my context, because my logs show 
that it contains a different port (not the one I need). But I can wait for a 
release with the bug fixed, it's not urgent.

Cheers
Corin


Re: acl using fc_dst_port not working

2023-03-10 Thread Aurelien DARRAGON
Hi,

> During my tests I can see in the logs that fc_dst_port is 8080. However,
> the ACL isn't set to true. If I try the same with "acl test
> fc_dst 127.0.0.2" it works as expected. However, this is not what I
> need. I also tried different matchers like "acl test fc_dst_port -m int
> 8080", "acl test fc_dst_port -m str 8080", "acl test fc_dst_port eq
> 8080" but nothing works. What am I doing wrong? Or is it a bug? haproxy
> version is 2.6.7-c55bfdb. Thank you for any help.
> 
> Corin


Indeed, there is a bug in the function smp_fetch_dport(): conn_get_src()
is used where conn_get_dst() should be used instead.

> diff --git a/src/tcp_sample.c b/src/tcp_sample.c
> index 925b93291..45a8e0f38 100644
> --- a/src/tcp_sample.c
> +++ b/src/tcp_sample.c
> @@ -235,7 +235,7 @@ smp_fetch_dport(const struct arg *args, struct sample 
> *smp, const char *kw, void
>   else if (kw[0] == 'f') { /* fc_dst_post */
>   struct connection *conn = objt_conn(smp->sess->origin);
>  
> - if (conn && conn_get_src(conn))
> + if (conn && conn_get_dst(conn))
>   dst = conn_dst(conn);
>   }
>  else /* dst_port */


Thank you for telling us, I'm working on the patch
Meanwhile, maybe "dst_port" could work as a workaround depending on your
needs?

Regards,
Aurelien



Re: [PATCH 1/1] DOC/CLEANUP: fix typos

2023-03-10 Thread Willy Tarreau
Hi Michael,

On Fri, Dec 09, 2022 at 12:28:46PM +0100, Michael Prokop wrote:
> s/algorithmm/algorithm/
> s/an other/another/
> s/certicates/certificates/
> s/exemples/examples/
> s/informations/information/
> s/optionnal/optional/
(...)

sorry, this seems to have slipped through the cracks. Now applied,
thank you!

Willy



[ANNOUNCE] haproxy-2.8-dev5

2023-03-10 Thread Willy Tarreau
Hi,

HAProxy 2.8-dev5 was released on 2023/03/10. It added 199 new commits
after version 2.8-dev4.

This version got a bit delayed due to spending a full week on a difficult
concurrency bug that was introduced during the 2.5 development cycle and
that can definitely explain some of the occasional strange reports we've
seen from time to time (connections not dying, crashes or CPU usage peaks).

The issue happened when the file descriptor management API was reworked to
introduce support for thread groups. A tiny race that was not possible
before was introduced and could occasionally permit a file descriptor to
be processed immediately after it was migrated to another thread in case
an I/O happened at the same moment (in fact most exclusively a persistent
connection to a backend server that was dropped by the server during its
migration). Due to the way FDs are allocated, it could even happen quite
often that the FD was immediately reassigned for another (or the same)
outgoing server or for an incoming request, possibly also leading to
wrong events being reported there (e.g. connection errors being delivered
on the next connection). Interestingly, the support for thread groups in
2.7 that required refcounting offered more possibilities to fix it but
increased the effect of the race so that it's easy to see frozen
connections. However, not having this refcounting prior to 2.7 made it
almost impossible to fix the issue in 2.5 or 2.6, requiring to backport
some of these mechanisms there in order to close the race (more on that
on the respective announce messages).

In short, I'll ask that those who face request timeouts on servers, or
abnormal CPU peaks from time to time, or even strange crashes whose
backtrace shows fd_update_events() try again on the updated versions.

Other old issues were addressed such as a possible infinite loop when
a listener gets rate-limited or is used at its maxconn at a high rate.
Enough speaking of the bugs, a total of 74 were fixed in this version
anyway.

Among the structural changes for heading for 2.8, Christopher's changes
to continue to improve error reporting between internal layers got another
update. As usual, lots of testing, no regressions expected, etc etc, but
please report anything strange (e.g. change of status flags in logs or
connections staying in CLOSE_WAIT).

Aurélien addressed some structural limitations of how listeners are
suspended and resumed during a failed reload. For example if an abns
listener couldn't be resumed (since they don't support pause and need
to be stopped), this could trigger a crash, which is not exactly what
you want when a new process failed to start given that it may indicate
a faulty config. Some of these might be backported to stable versions
after some observation time.

QUIC's error handling was improved, including at the socket level. There
are now less losses thanks to the sender now subscribing to the poller.
There are also a number of small improvements that I'm totally unable to
explain, but which resulted in both the interop and quic tracker tools
to report even less failures. Now we're at a point where haproxy is among
the most successful stacks on both sites, this is great!

The config predicates used with .if or -cc on the command line now got
two new functions, "enabled(name)" to test if a runtime feature is enabled
(e.g. "SPLICE" etc), and "strstr(subject,patter)" that is convenient to
check for the presence of some patterns in environment variables. By the
way a new config-time environment variable $HAPROXY_BRANCH now contains
the current branch. This is helpful during migrations to switch certain
options to one version or the other.

JWT now supports RSA-PSS signatures, which could report an "Unmanaged
algorithm" error before.

Also, the "option httpclose" used to cause some trouble for some time,
since for a long time the union of the frontend's and the backend's were
used when deciding how to handle a backend connection. While this used to
make sense before 1.8 where the same stream was reset and recycled for
all subsequent requests, it has become completely counter-intuitive now
to imagine that "option httpclose" in a frontend will result in the
backend connection to be killed after the response. Given that explanations
starting with "well, this is for historical reasons" are generally wrong,
it was about time to address this one and do what users think it does (and
update the doc to reflect this and remove the exception).

Rémi merged some OCSP update patches as well. There are status counters
and info that are dumped in "show ssl crt-list". Also the new "show ssl
ocsp-updates" report new info. All these automated updates will now have
their own log format.

Finally some new regtests were added and others updated.

Oh, by the way, we noticed a regression that affects 2.7 and 2.8. If you
connect to the CLI over a UNIX socket and the client closes the input
channel, the connection will not be closed. Given that the 

[ANNOUNCE] haproxy-2.6.10

2023-03-10 Thread Willy Tarreau
Hi,

HAProxy 2.6.10 was released on 2023/03/10. It added 78 new commits
after version 2.6.9.

A bit more than half of the commits are HTTP3/QUIC fixes. However, as
indicated in the 2.8-dev5 announce, a concurrency bug introduced in 2.5
was fixed in this version, that may cause freezes and crashes when some
HTTP/1 backend connections are closed by the server exactly at the same
time they're going to be reused by another thread. Another different bug
also affecting idle connections since 2.2 was fixed, possibly causing an
occasional crash. One possible work-around if you've faced such issues
recently is to disable inter-thread connection reuse with this directive
in the global section:

   tune.idle-pool.shared off

But beware that this may increase the total number of connections kept
established with your backend servers depending the reuse frequency and
the number of threads.

I want to be clear on one point: the issue is structural, and trying to
port these fixes to 2.6 just made the situation much worse! The only
solution we found to address it relies on some facilities that were
integrated in 2.7 and that offer the guarantees we need during certain
critical transitions of a file descriptor state (refcount and owning
thread group). I have not found any workable solution to the problem
without these facilities, so this required that I backported the strict
minimum amount of patches (18) to bring these facilities there. I hate
having to do that but this time there was no other option. And that made
me realize that instead of keeping 2.6 on its own half-way architecture
like it was, it's not a bad thing that it ressembles next versions to
make backports of fixes more reliable in the future. Despite the large
amount of tests in legacy and master-worker modes, with/without threads,
with reloads, FD passing, saturated listeners etc, it remains possible
that I failed on a corner case. So please watch a little bit more than
usual after you update, and do not hesitate to report any issue you
think you might face consecutive to this.

Other, less critical, issues are described below.

In master-worker mode, when performing an upgrade from an old version
(before 1.9) to a newer version (>=2.5) the HAPROXY_PROCESSES environment
variable was missing, and this combined with a missing element in an
internal structure representing old processes will result in a null-deref
which will crash the master process after the reload. It's very unlikely
to hit this one, except during migration attempts where it can make one
think the new version doesn't work, and encourage to roll back to the
older one. The reported uptime for processes was also fixed so that wall
clock time is used instead of the internal timer.

A few issues affecting the Lua mapping of the HTTP client were addressed;
one of them is a small memory leak by which a few bytes could leak per
request, which could become problematic if used heavily. Another one is
a concurrency issue with Lua's garbage collector that didn't sufficiently
lock other threads' items while trying to free them.

It was found that the low-latency scheduling of TLS handshakes can
degenerate during extreme loads, and take a long time to recover. The
problem is that in order to prevent TLS handshakes from causing high
latency spikes to the rest of the traffic, they're placed in a dedicated
scheduling class that executes one of them per polling loop. But if there
are too many pending due to a big burst, the extra latency caused to the
pending ones can make clients give up and try again, reaching the point
where none of the processed tasks yields anything useful since they were
already abandonned. Now the number of handshakes per loop will grow as
the number of pending ones grows, and this addresses the problem without
adding extra latency even under extreme loads.

There were various QUIC fixes aiming at addressing some issues reported
by users and tests.

The cache failed to cache a response for a request that had the "no-cache"
directive (typically a forced reload). This prevented from refreshing the
cache this way, this is now fixed.

In some rare cases it was possible to freeze a compressing stream if there
was exactly one byte left at the end of the buffer, which was insufficient
to place a new HTX block and prevented any progress from being made. This
has been the case since 2.5 so it doesn't seem easy to trigger!

Layer7 retries did not work anymore on the "empty-response" condition due
to a change that was made in 2.4.

The dump of the supported config language keywords with -dK incorrectly
attributed some of the crt-list specific keywords to "bind ... ssl", which
could cause confusion for those designing config parsers or generators by
regularly checking for new stuff there. Now an explicit "crt-list" sub-
section is dumped and "bind ssl" only dumps keywords really supported on
"bind" lines.

The global directive "no numa-cpu-mapping" that forces haproxy to bind to
multiple CPU 

[ANNOUNCE] haproxy-2.7.4

2023-03-10 Thread Willy Tarreau
Hi,

HAProxy 2.7.4 was released on 2023/03/10. It added 110 new commits
after version 2.7.3.

The vast majority of the commits are HTTP3/QUIC updates. However, as
indicated in the 2.8-dev5 announce, a concurrency bug introduced in 2.5
was fixed in this version, that may cause freezes and crashes when some
HTTP/1 backend connections are closed by the server exactly at the same
time they're going to be reused by another thread. Another different bug
also affecting idle connections since 2.2 was fixed, possibly causing an
occasional crash. One possible work-around if you've faced such issues
recently is to disable inter-thread connection reuse with this directive
in the global section:

   tune.idle-pool.shared off

But beware that this may increase the total number of connections kept
established with your backend servers depending the reuse frequency and
the number of threads.

In master-worker mode, when performing an upgrade from an old version
(before 1.9) to a newer version (>=2.5) the HAPROXY_PROCESSES environment
variable was missing, and this combined with a missing element in an
internal structure representing old processes will result in a null-deref
which will crash the master process after the reload. It's very unlikely
to hit this one, except during migration attempts where it can make one
think the new version doesn't work, and encourage to roll back to the
older one. The reported uptime for processes was also fixed so that wall
clock time is used instead of the internal timer.

A few issues affecting the Lua mapping of the HTTP client were addressed;
one of them is a small memory leak by which a few bytes could leak per
request, which could become problematic if used heavily. Another one is
a concurrency issue with Lua's garbage collector that didn't sufficiently
lock other threads' items while trying to free them.

A bug in the watchdog in 2.7 could occasionally make the wrong thread
being measured, which could sometimes trigger it on highly asymmetric
loads, such as if frontends are bound to different thread sets and one
saturates the process while the other one remains fully idle.

It was found that the low-latency scheduling of TLS handshakes can
degenerate during extreme loads, and take a long time to recover. The
problem is that in order to prevent TLS handshakes from causing high
latency spikes to the rest of the traffic, they're placed in a dedicated
scheduling class that executes one of them per polling loop. But if there
are too many pending due to a big burst, the extra latency caused to the
pending ones can make clients give up and try again, reaching the point
where none of the processed tasks yields anything useful since they were
already abandonned. Now the number of handshakes per loop will grow as
the number of pending ones grows, and this addresses the problem without
adding extra latency even under extreme loads.

There were as usual a significant number of QUIC updates, aiming at
addressing some issues reported by users, and to continue to improve
reliability and interoperability. Among the visible ones, the client-fin
timeout is now honored and allows to close faster when a last response
was sent if the client disappeared. This could help reduce the number of
apparent concurrent connections. In addition, some improvements were made
on memory usage. Some failures to connect at high rates (such as from
h2load) were finally fixed. The soft-stop is now fully functional when
"tune.quic.socket-owner" is set to "connection". The old process will
then continue to handle its connections and the new one will have its own
connections. Low-level errors are now better handled, with some errors
such as ICMP port unreachable reported by the stack causing an immediate
termination of the connection since it indicates the client has closed
(e.g. clicked stop in browser, or Ctrl-C in Curl).

The cache failed to cache a response for a request that had the "no-cache"
directive (typically a forced reload). This prevented from refreshing the
cache this way, this is now fixed.

An infinite loop could happen on limited listeners (rate-limited or limited
by their maxconn value) due to the loss of some volatile casts during an
API cleanup in 2.7.

In some rare cases it was possible to freeze a compressing stream if there
was exactly one byte left at the end of the buffer, which was insufficient
to place a new HTX block and prevented any progress from being made. This
has been the case since 2.5 so it doesn't seem easy to trigger!

Layer7 retries did not work anymore on the "empty-response" condition due
to a change that was made in 2.4.

The dump of the supported config language keywords with -dK incorrectly
attributed some of the crt-list specific keywords to "bind ... ssl", which
could cause confusion for those designing config parsers or generators by
regularly checking for new stuff there. Now an explicit "crt-list" sub-
section is dumped and "bind ssl" only dumps keywords really supported on