Re: AW: acl using fc_dst_port not working
You're welcome! The same bug was also found in the smp_fetch_dst_is_local() function (fc_dst_is_local sample fetch). Attached is the patch that addresses the 2 bugs, it will probably be included in the next batch of releases. 2.5 and 2.7 are also affected > Unfortunately, this option doesn't work in my context, because my logs > show that it contains a different port (not the one I need) Oh ok :/ > But I can wait for a release with the bug fixed, it's not urgent. Thank you for you patience :) Aurelien From 69aaeeb8a54e60ddd66481f91354f0d99c75ec05 Mon Sep 17 00:00:00 2001 From: Aurelien DARRAGON Date: Fri, 10 Mar 2023 16:53:43 +0100 Subject: [PATCH] BUG/MINOR: tcp_sample: fix a bug in fc_dst_port and fc_dst_is_local sample fetches There is a bug in the function smp_fetch_dport() which affects the 'f' case, also known as 'fc_dst_port' sample fetch. conn_get_src() is used to retrieve the address prior to calling conn_dst(). But this is wrong: conn_get_dst() should be used instead. Because of that, conn_dst() may return unexpected results since the dst address is not guaranteed to be set depending on the conn state at the time the sample fetch is used. This was reported by Corin Langosch on the ML: during his tests he noticed that using fc_dst_port in a log-format string resulted in the correct value being printed in the logs but when he used it in an ACL, the ACL did not evaluate properly. This can be easily reproduced with the following test conf: |frontend test-http | bind 127.0.0.1:8080 | mode http | | acl test fc_dst_port eq 8080 | http-request return status 200 if test | http-request return status 500 if !test A request on 127.0.0.1:8080 should normally return 200 OK, but here it will return a 500. The same bug was also found in smp_fetch_dst_is_local() (fc_dst_is_local sample fetch) by reading the code: the fix was applied twice. This needs to be backported up to 2.5 [both sample fetches were introduced in 2.5 with 888cd70 ("MINOR: tcp-sample: Add samples to get original info about client connection")] --- src/tcp_sample.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/tcp_sample.c b/src/tcp_sample.c index 925b93291..12eb25c4e 100644 --- a/src/tcp_sample.c +++ b/src/tcp_sample.c @@ -176,7 +176,7 @@ int smp_fetch_dst_is_local(const struct arg *args, struct sample *smp, const cha if (kw[0] == 'f') { /* fc_dst_is_local */ struct connection *conn = objt_conn(smp->sess->origin); - if (conn && conn_get_src(conn)) + if (conn && conn_get_dst(conn)) dst = conn_dst(conn); } else /* dst_is_local */ @@ -232,10 +232,10 @@ smp_fetch_dport(const struct arg *args, struct sample *smp, const char *kw, void if (conn && conn_get_dst(conn)) dst = conn_dst(conn); } - else if (kw[0] == 'f') { /* fc_dst_post */ + else if (kw[0] == 'f') { /* fc_dst_port */ struct connection *conn = objt_conn(smp->sess->origin); - if (conn && conn_get_src(conn)) + if (conn && conn_get_dst(conn)) dst = conn_dst(conn); } else /* dst_port */ -- 2.34.1
AW: acl using fc_dst_port not working
Hello Aurelien, thank you very much for your reply! Indeed, there is a bug in the function smp_fetch_dport(): conn_get_src() is used where conn_get_dst() should be used instead. Thank you for fixing it! Thank you for telling us, I'm working on the patch Meanwhile, maybe "dst_port" could work as a workaround depending on your needs? Unfortunately, this option doesn't work in my context, because my logs show that it contains a different port (not the one I need). But I can wait for a release with the bug fixed, it's not urgent. Cheers Corin
Re: acl using fc_dst_port not working
Hi, > During my tests I can see in the logs that fc_dst_port is 8080. However, > the ACL isn't set to true. If I try the same with "acl test > fc_dst 127.0.0.2" it works as expected. However, this is not what I > need. I also tried different matchers like "acl test fc_dst_port -m int > 8080", "acl test fc_dst_port -m str 8080", "acl test fc_dst_port eq > 8080" but nothing works. What am I doing wrong? Or is it a bug? haproxy > version is 2.6.7-c55bfdb. Thank you for any help. > > Corin Indeed, there is a bug in the function smp_fetch_dport(): conn_get_src() is used where conn_get_dst() should be used instead. > diff --git a/src/tcp_sample.c b/src/tcp_sample.c > index 925b93291..45a8e0f38 100644 > --- a/src/tcp_sample.c > +++ b/src/tcp_sample.c > @@ -235,7 +235,7 @@ smp_fetch_dport(const struct arg *args, struct sample > *smp, const char *kw, void > else if (kw[0] == 'f') { /* fc_dst_post */ > struct connection *conn = objt_conn(smp->sess->origin); > > - if (conn && conn_get_src(conn)) > + if (conn && conn_get_dst(conn)) > dst = conn_dst(conn); > } > else /* dst_port */ Thank you for telling us, I'm working on the patch Meanwhile, maybe "dst_port" could work as a workaround depending on your needs? Regards, Aurelien
Re: [PATCH 1/1] DOC/CLEANUP: fix typos
Hi Michael, On Fri, Dec 09, 2022 at 12:28:46PM +0100, Michael Prokop wrote: > s/algorithmm/algorithm/ > s/an other/another/ > s/certicates/certificates/ > s/exemples/examples/ > s/informations/information/ > s/optionnal/optional/ (...) sorry, this seems to have slipped through the cracks. Now applied, thank you! Willy
[ANNOUNCE] haproxy-2.8-dev5
Hi, HAProxy 2.8-dev5 was released on 2023/03/10. It added 199 new commits after version 2.8-dev4. This version got a bit delayed due to spending a full week on a difficult concurrency bug that was introduced during the 2.5 development cycle and that can definitely explain some of the occasional strange reports we've seen from time to time (connections not dying, crashes or CPU usage peaks). The issue happened when the file descriptor management API was reworked to introduce support for thread groups. A tiny race that was not possible before was introduced and could occasionally permit a file descriptor to be processed immediately after it was migrated to another thread in case an I/O happened at the same moment (in fact most exclusively a persistent connection to a backend server that was dropped by the server during its migration). Due to the way FDs are allocated, it could even happen quite often that the FD was immediately reassigned for another (or the same) outgoing server or for an incoming request, possibly also leading to wrong events being reported there (e.g. connection errors being delivered on the next connection). Interestingly, the support for thread groups in 2.7 that required refcounting offered more possibilities to fix it but increased the effect of the race so that it's easy to see frozen connections. However, not having this refcounting prior to 2.7 made it almost impossible to fix the issue in 2.5 or 2.6, requiring to backport some of these mechanisms there in order to close the race (more on that on the respective announce messages). In short, I'll ask that those who face request timeouts on servers, or abnormal CPU peaks from time to time, or even strange crashes whose backtrace shows fd_update_events() try again on the updated versions. Other old issues were addressed such as a possible infinite loop when a listener gets rate-limited or is used at its maxconn at a high rate. Enough speaking of the bugs, a total of 74 were fixed in this version anyway. Among the structural changes for heading for 2.8, Christopher's changes to continue to improve error reporting between internal layers got another update. As usual, lots of testing, no regressions expected, etc etc, but please report anything strange (e.g. change of status flags in logs or connections staying in CLOSE_WAIT). Aurélien addressed some structural limitations of how listeners are suspended and resumed during a failed reload. For example if an abns listener couldn't be resumed (since they don't support pause and need to be stopped), this could trigger a crash, which is not exactly what you want when a new process failed to start given that it may indicate a faulty config. Some of these might be backported to stable versions after some observation time. QUIC's error handling was improved, including at the socket level. There are now less losses thanks to the sender now subscribing to the poller. There are also a number of small improvements that I'm totally unable to explain, but which resulted in both the interop and quic tracker tools to report even less failures. Now we're at a point where haproxy is among the most successful stacks on both sites, this is great! The config predicates used with .if or -cc on the command line now got two new functions, "enabled(name)" to test if a runtime feature is enabled (e.g. "SPLICE" etc), and "strstr(subject,patter)" that is convenient to check for the presence of some patterns in environment variables. By the way a new config-time environment variable $HAPROXY_BRANCH now contains the current branch. This is helpful during migrations to switch certain options to one version or the other. JWT now supports RSA-PSS signatures, which could report an "Unmanaged algorithm" error before. Also, the "option httpclose" used to cause some trouble for some time, since for a long time the union of the frontend's and the backend's were used when deciding how to handle a backend connection. While this used to make sense before 1.8 where the same stream was reset and recycled for all subsequent requests, it has become completely counter-intuitive now to imagine that "option httpclose" in a frontend will result in the backend connection to be killed after the response. Given that explanations starting with "well, this is for historical reasons" are generally wrong, it was about time to address this one and do what users think it does (and update the doc to reflect this and remove the exception). Rémi merged some OCSP update patches as well. There are status counters and info that are dumped in "show ssl crt-list". Also the new "show ssl ocsp-updates" report new info. All these automated updates will now have their own log format. Finally some new regtests were added and others updated. Oh, by the way, we noticed a regression that affects 2.7 and 2.8. If you connect to the CLI over a UNIX socket and the client closes the input channel, the connection will not be closed. Given that the
[ANNOUNCE] haproxy-2.6.10
Hi, HAProxy 2.6.10 was released on 2023/03/10. It added 78 new commits after version 2.6.9. A bit more than half of the commits are HTTP3/QUIC fixes. However, as indicated in the 2.8-dev5 announce, a concurrency bug introduced in 2.5 was fixed in this version, that may cause freezes and crashes when some HTTP/1 backend connections are closed by the server exactly at the same time they're going to be reused by another thread. Another different bug also affecting idle connections since 2.2 was fixed, possibly causing an occasional crash. One possible work-around if you've faced such issues recently is to disable inter-thread connection reuse with this directive in the global section: tune.idle-pool.shared off But beware that this may increase the total number of connections kept established with your backend servers depending the reuse frequency and the number of threads. I want to be clear on one point: the issue is structural, and trying to port these fixes to 2.6 just made the situation much worse! The only solution we found to address it relies on some facilities that were integrated in 2.7 and that offer the guarantees we need during certain critical transitions of a file descriptor state (refcount and owning thread group). I have not found any workable solution to the problem without these facilities, so this required that I backported the strict minimum amount of patches (18) to bring these facilities there. I hate having to do that but this time there was no other option. And that made me realize that instead of keeping 2.6 on its own half-way architecture like it was, it's not a bad thing that it ressembles next versions to make backports of fixes more reliable in the future. Despite the large amount of tests in legacy and master-worker modes, with/without threads, with reloads, FD passing, saturated listeners etc, it remains possible that I failed on a corner case. So please watch a little bit more than usual after you update, and do not hesitate to report any issue you think you might face consecutive to this. Other, less critical, issues are described below. In master-worker mode, when performing an upgrade from an old version (before 1.9) to a newer version (>=2.5) the HAPROXY_PROCESSES environment variable was missing, and this combined with a missing element in an internal structure representing old processes will result in a null-deref which will crash the master process after the reload. It's very unlikely to hit this one, except during migration attempts where it can make one think the new version doesn't work, and encourage to roll back to the older one. The reported uptime for processes was also fixed so that wall clock time is used instead of the internal timer. A few issues affecting the Lua mapping of the HTTP client were addressed; one of them is a small memory leak by which a few bytes could leak per request, which could become problematic if used heavily. Another one is a concurrency issue with Lua's garbage collector that didn't sufficiently lock other threads' items while trying to free them. It was found that the low-latency scheduling of TLS handshakes can degenerate during extreme loads, and take a long time to recover. The problem is that in order to prevent TLS handshakes from causing high latency spikes to the rest of the traffic, they're placed in a dedicated scheduling class that executes one of them per polling loop. But if there are too many pending due to a big burst, the extra latency caused to the pending ones can make clients give up and try again, reaching the point where none of the processed tasks yields anything useful since they were already abandonned. Now the number of handshakes per loop will grow as the number of pending ones grows, and this addresses the problem without adding extra latency even under extreme loads. There were various QUIC fixes aiming at addressing some issues reported by users and tests. The cache failed to cache a response for a request that had the "no-cache" directive (typically a forced reload). This prevented from refreshing the cache this way, this is now fixed. In some rare cases it was possible to freeze a compressing stream if there was exactly one byte left at the end of the buffer, which was insufficient to place a new HTX block and prevented any progress from being made. This has been the case since 2.5 so it doesn't seem easy to trigger! Layer7 retries did not work anymore on the "empty-response" condition due to a change that was made in 2.4. The dump of the supported config language keywords with -dK incorrectly attributed some of the crt-list specific keywords to "bind ... ssl", which could cause confusion for those designing config parsers or generators by regularly checking for new stuff there. Now an explicit "crt-list" sub- section is dumped and "bind ssl" only dumps keywords really supported on "bind" lines. The global directive "no numa-cpu-mapping" that forces haproxy to bind to multiple CPU
[ANNOUNCE] haproxy-2.7.4
Hi, HAProxy 2.7.4 was released on 2023/03/10. It added 110 new commits after version 2.7.3. The vast majority of the commits are HTTP3/QUIC updates. However, as indicated in the 2.8-dev5 announce, a concurrency bug introduced in 2.5 was fixed in this version, that may cause freezes and crashes when some HTTP/1 backend connections are closed by the server exactly at the same time they're going to be reused by another thread. Another different bug also affecting idle connections since 2.2 was fixed, possibly causing an occasional crash. One possible work-around if you've faced such issues recently is to disable inter-thread connection reuse with this directive in the global section: tune.idle-pool.shared off But beware that this may increase the total number of connections kept established with your backend servers depending the reuse frequency and the number of threads. In master-worker mode, when performing an upgrade from an old version (before 1.9) to a newer version (>=2.5) the HAPROXY_PROCESSES environment variable was missing, and this combined with a missing element in an internal structure representing old processes will result in a null-deref which will crash the master process after the reload. It's very unlikely to hit this one, except during migration attempts where it can make one think the new version doesn't work, and encourage to roll back to the older one. The reported uptime for processes was also fixed so that wall clock time is used instead of the internal timer. A few issues affecting the Lua mapping of the HTTP client were addressed; one of them is a small memory leak by which a few bytes could leak per request, which could become problematic if used heavily. Another one is a concurrency issue with Lua's garbage collector that didn't sufficiently lock other threads' items while trying to free them. A bug in the watchdog in 2.7 could occasionally make the wrong thread being measured, which could sometimes trigger it on highly asymmetric loads, such as if frontends are bound to different thread sets and one saturates the process while the other one remains fully idle. It was found that the low-latency scheduling of TLS handshakes can degenerate during extreme loads, and take a long time to recover. The problem is that in order to prevent TLS handshakes from causing high latency spikes to the rest of the traffic, they're placed in a dedicated scheduling class that executes one of them per polling loop. But if there are too many pending due to a big burst, the extra latency caused to the pending ones can make clients give up and try again, reaching the point where none of the processed tasks yields anything useful since they were already abandonned. Now the number of handshakes per loop will grow as the number of pending ones grows, and this addresses the problem without adding extra latency even under extreme loads. There were as usual a significant number of QUIC updates, aiming at addressing some issues reported by users, and to continue to improve reliability and interoperability. Among the visible ones, the client-fin timeout is now honored and allows to close faster when a last response was sent if the client disappeared. This could help reduce the number of apparent concurrent connections. In addition, some improvements were made on memory usage. Some failures to connect at high rates (such as from h2load) were finally fixed. The soft-stop is now fully functional when "tune.quic.socket-owner" is set to "connection". The old process will then continue to handle its connections and the new one will have its own connections. Low-level errors are now better handled, with some errors such as ICMP port unreachable reported by the stack causing an immediate termination of the connection since it indicates the client has closed (e.g. clicked stop in browser, or Ctrl-C in Curl). The cache failed to cache a response for a request that had the "no-cache" directive (typically a forced reload). This prevented from refreshing the cache this way, this is now fixed. An infinite loop could happen on limited listeners (rate-limited or limited by their maxconn value) due to the loss of some volatile casts during an API cleanup in 2.7. In some rare cases it was possible to freeze a compressing stream if there was exactly one byte left at the end of the buffer, which was insufficient to place a new HTX block and prevented any progress from being made. This has been the case since 2.5 so it doesn't seem easy to trigger! Layer7 retries did not work anymore on the "empty-response" condition due to a change that was made in 2.4. The dump of the supported config language keywords with -dK incorrectly attributed some of the crt-list specific keywords to "bind ... ssl", which could cause confusion for those designing config parsers or generators by regularly checking for new stuff there. Now an explicit "crt-list" sub- section is dumped and "bind ssl" only dumps keywords really supported on