Re: [PATCH 0/4] Fix more frees
Hi Tim, On Tue, Jun 16, 2020 at 12:03:01AM +0200, Tim Duesterhus wrote: > Willy, > > this series fixes up a few more frees. This time I have verified the changes > a bit more carefully, running configuration check on a real world > configuration > of mine within valgrind. It still reports a five leaks (but less than without > applying these patches!) and does not report any memory unsafety / bogus frees > / double frees / use-after-frees. > > The first two patches fix leaks within helper functions. I triggered them > during deinit, but they might or might not be accessible during normal > operation (e.g. via the CLI socket). The last two patches are part of the > deinit() function and thus cannot happen in normal operation, but cleaning > them up makes it easier to find actual, important, issues. > > Feel free to apply to 'next' only (or after the release of 2.2). No need to > risk breaking anything shortly before release with something less important > like this. And of course please carefully review them :-) Thank you for this work. I appreciate how painful it is to track this stuff down to the inner code. I'd indeed prefer to postpone its integration for having been bitten many times. One specific case you need to take care of is the situation where there is an error in one of the ACL patterns that prevents the expression from being completed and that can result in some elements not to be resolved. I typically don't care if we observe memleaks in such a case since it dies in an error, but we don't want to see double frees nor dereference of a wrong pointer. I don't remember all the corner cases but they'll basically look like this : http-request deny if { src 12.12.12.12 13.13.13.13/33 } (or anything that manages to cause a parsing error on a pattern after one was properly resolved). This may also happen in other situations like when an argument references a server/backend/stick-table that doesn't exist and as such fails the config check, for example: http-request deny if { nbsrv(blah) gt 2 } or possibly if that's one argument of a converter. I suspect that your patches are OK because very likely release_sample_expr() was created after the code paths you've changed. But it's indeed a bit tricky and deep into the error path. Do not hesitate to ping me after the release if I forget to get back to it. Cheers, Willy
[PATCH 1/4] BUG/MINOR: acl: Fix freeing of expr->smp in prune_acl_expr
Instead of simply calling free() in expr->smp->arg_p in certain cases properly free the sample using release_sample_expr(). Given the following example configuration: frontend foo bind *:8080 mode http http-request set-var(txn.foo) str(bar) acl is_match str(foo),strcmp(txn.hash) -m bool Running a configuration check within valgrind reports: ==31371== 160 (48 direct, 112 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 45 ==31371==at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371==by 0x4C3832: sample_parse_expr (sample.c:876) ==31371==by 0x56B3E0: parse_acl_expr (acl.c:319) ==31371==by 0x56BA4F: parse_acl (acl.c:697) ==31371==by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816) ==31371==by 0x4797C3: readcfgfile (cfgparse.c:2167) ==31371==by 0x5293ED: init (haproxy.c:2021) ==31371==by 0x41F382: main (haproxy.c:3126) After this patch this leak is reduced. It will be fully removed in a follow up patch: ==32503== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43 ==32503==at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==32503==by 0x4C39B5: sample_parse_expr (sample.c:982) ==32503==by 0x56B410: parse_acl_expr (acl.c:319) ==32503==by 0x56BA7F: parse_acl (acl.c:697) ==32503==by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816) ==32503==by 0x4797C3: readcfgfile (cfgparse.c:2167) ==32503==by 0x52943D: init (haproxy.c:2021) ==32503==by 0x41F382: main (haproxy.c:3133) This is a fairly minor leak that can only be observed if ACLs need to be freed, which is not something that should occur during normal processing and most likely only during shut down. Thus no backport should be needed. --- src/acl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/acl.c b/src/acl.c index 5fea5dcc2..333da22c9 100644 --- a/src/acl.c +++ b/src/acl.c @@ -117,8 +117,8 @@ static struct acl_expr *prune_acl_expr(struct acl_expr *expr) } } - if (expr->smp->arg_p != empty_arg_list && !unresolved) - free(expr->smp->arg_p); + release_sample_expr(expr->smp); + return expr; } -- 2.27.0
[PATCH 4/4] BUG/MINOR: haproxy: Add missing free of server->(hostname|resolvers_id)
Given the following example configuration: resolvers test nameserver test 127.0.0.1:53 listen foo bind *:8080 server foo example.com resolvers test Running a configuration check within valgrind reports: ==21995== 5 bytes in 1 blocks are definitely lost in loss record 1 of 30 ==21995==at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==21995==by 0x5726489: strdup (strdup.c:42) ==21995==by 0x4B2CFB: parse_server (server.c:2163) ==21995==by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534) ==21995==by 0x459E33: readcfgfile (cfgparse.c:2167) ==21995==by 0x50778D: init (haproxy.c:2021) ==21995==by 0x418262: main (haproxy.c:3133) ==21995== ==21995== 12 bytes in 1 blocks are definitely lost in loss record 3 of 30 ==21995==at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==21995==by 0x5726489: strdup (strdup.c:42) ==21995==by 0x4AC666: srv_prepare_for_resolution (server.c:1606) ==21995==by 0x4B2EBD: parse_server (server.c:2081) ==21995==by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534) ==21995==by 0x459E33: readcfgfile (cfgparse.c:2167) ==21995==by 0x50778D: init (haproxy.c:2021) ==21995==by 0x418262: main (haproxy.c:3133) with one more leak unrelated to `struct server`. After applying this patch the leak is gone as expected. This is a very minor leak that can only be observed if deinit() is called, shortly before the OS will free all memory of the process anyway. No backport needed. --- src/haproxy.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/haproxy.c b/src/haproxy.c index 818cf3504..52fca84b5 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -2743,12 +2743,14 @@ void deinit(void) free(s->id); free(s->cookie); + free(s->hostname); free(s->hostname_dn); free((char*)s->conf.file); free(s->idle_conns); free(s->safe_conns); free(s->available_conns); free(s->curr_idle_thr); + free(s->resolvers_id); if (s->use_ssl == 1 || s->check.use_ssl == 1 || (s->proxy->options & PR_O_TCPCHK_SSL)) { if (xprt_get(XPRT_SSL) && xprt_get(XPRT_SSL)->destroy_srv) -- 2.27.0
[PATCH 3/4] BUG/MINOR: haproxy: Free proxy->format_unique_id during deinit
Given the following example configuration: frontend foo mode http bind *:8080 unique-id-format x Running a configuration check with valgrind reports: ==30712== 42 (40 direct, 2 indirect) bytes in 1 blocks are definitely lost in loss record 18 of 39 ==30712==at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==30712==by 0x4ED7E9: add_to_logformat_list (log.c:462) ==30712==by 0x4EEE28: parse_logformat_string (log.c:720) ==30712==by 0x47B09A: check_config_validity (cfgparse.c:3046) ==30712==by 0x52881D: init (haproxy.c:2121) ==30712==by 0x41F382: main (haproxy.c:3126) After this patch is applied the leak is gone as expected. This is a very minor leak that can only be observed if deinit() is called, shortly before the OS will free all memory of the process anyway. No backport needed. --- src/haproxy.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/haproxy.c b/src/haproxy.c index 215c68a30..818cf3504 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -2698,6 +2698,13 @@ void deinit(void) free(lf); } + list_for_each_entry_safe(lf, lfb, >format_unique_id, list) { + LIST_DEL(>list); + release_sample_expr(lf->expr); + free(lf->arg); + free(lf); + } + deinit_act_rules(>tcp_req.inspect_rules); deinit_act_rules(>tcp_rep.inspect_rules); deinit_act_rules(>tcp_req.l4_rules); -- 2.27.0
[PATCH 0/4] Fix more frees
Willy, this series fixes up a few more frees. This time I have verified the changes a bit more carefully, running configuration check on a real world configuration of mine within valgrind. It still reports a five leaks (but less than without applying these patches!) and does not report any memory unsafety / bogus frees / double frees / use-after-frees. The first two patches fix leaks within helper functions. I triggered them during deinit, but they might or might not be accessible during normal operation (e.g. via the CLI socket). The last two patches are part of the deinit() function and thus cannot happen in normal operation, but cleaning them up makes it easier to find actual, important, issues. Feel free to apply to 'next' only (or after the release of 2.2). No need to risk breaking anything shortly before release with something less important like this. And of course please carefully review them :-) Best regards Tim Düsterhus (4): BUG/MINOR: acl: Fix freeing of expr->smp in prune_acl_expr BUG/MINOR: sample: Fix freeing of conv_exprs in release_sample_expr BUG/MINOR: haproxy: Free proxy->format_unique_id during deinit BUG/MINOR: haproxy: Add missing free of server->(hostname|resolvers_id) src/acl.c | 4 ++-- src/haproxy.c | 9 + src/sample.c | 6 +- 3 files changed, 16 insertions(+), 3 deletions(-) -- 2.27.0
[PATCH 2/4] BUG/MINOR: sample: Fix freeing of conv_exprs in release_sample_expr
Instead of just calling release_sample_arg(conv_expr->arg_p) we also must free() the conv_expr itself (after removing it from the list). Given the following example configuration: frontend foo bind *:8080 mode http http-request set-var(txn.foo) str(bar) acl is_match str(foo),strcmp(txn.hash) -m bool Running a configuration check within valgrind reports: ==1431== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43 ==1431==at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1431==by 0x4C39B5: sample_parse_expr (sample.c:982) ==1431==by 0x56B410: parse_acl_expr (acl.c:319) ==1431==by 0x56BA7F: parse_acl (acl.c:697) ==1431==by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816) ==1431==by 0x4797C3: readcfgfile (cfgparse.c:2167) ==1431==by 0x52943D: init (haproxy.c:2021) ==1431==by 0x41F382: main (haproxy.c:3133) After this patch is applied the leak is gone as expected. This is a fairly minor leak that can only be observed if samples need to be freed, which is not something that should occur during normal processing and most likely only during shut down. Thus no backport should be needed. --- src/sample.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/sample.c b/src/sample.c index faf6471c0..a6633e849 100644 --- a/src/sample.c +++ b/src/sample.c @@ -1424,8 +1424,12 @@ void release_sample_expr(struct sample_expr *expr) if (!expr) return; - list_for_each_entry_safe(conv_expr, conv_exprb, >conv_exprs, list) + list_for_each_entry_safe(conv_expr, conv_exprb, >conv_exprs, list) { + LIST_DEL(_expr->list); release_sample_arg(conv_expr->arg_p); + free(conv_expr); + } + release_sample_arg(expr->arg_p); free(expr); } -- 2.27.0
Re: [PATCH] BUG/MINOR: systemd: Wait for network to be online
I posted this patch to start some discussion here. I'm not the first to notice this problem but I was, until now, hesitant to change the systemd service file until now. The reason for this was that waiting for network-online.target could delay boot time. Please see systemd network target docs here [1]. As stated in the commit message, the common reason I was asked to change this in RHEL/Fedora was due to attempting to bind to a non-existent IP address, but that can be overcome with 'option transparent'. However I recently was notified that DNS resolution will fail when haproxy starts if the network is not fully online. Thus I suggested the patch. I also found this discussion [2], but noticed that the upstream service file had not been modified. Cheers, Ryan [1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ [2] https://discourse.haproxy.org/t/haproxy-fails-on-restart/3469/10 On Mon, Jun 15, 2020 at 12:03 PM Ryan O'Hara wrote: > Change systemd service file to wait for network to be completely > online. This solves two problems: > > If haproxy is configured to bind to IP address(es) that are not yet > assigned, haproxy would previously fail. The workaround is to use > "option transparent". > > If haproxy us configured to use a resolver to resolve servers via DNS, > haproxy would previously fail due to the fact that the network is not > fully online yet. This is the most compelling reason for this patch. > > Signed-off-by: Ryan O'Hara > --- > contrib/systemd/haproxy.service.in | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/contrib/systemd/haproxy.service.in b/contrib/systemd/ > haproxy.service.in > index 9b7c3d1bb..05fc59579 100644 > --- a/contrib/systemd/haproxy.service.in > +++ b/contrib/systemd/haproxy.service.in > @@ -1,6 +1,7 @@ > [Unit] > Description=HAProxy Load Balancer > -After=network.target > +After=network-online.target > +Wants=network-online.target > > [Service] > EnvironmentFile=-/etc/default/haproxy > -- > 2.25.1 > > >
Re: VTest does not test deinit
Hi Tim, On Sun, Jun 14, 2020 at 06:24:19PM +0200, Tim Düsterhus wrote: > Hi List, > Willy, > Ilya, > > I noticed that the reg-tests were unable find the issue reported by > William here: > https://www.mail-archive.com/haproxy@formilux.org/msg37637.html > > This is because VTest never performs a "soft" shutdown of HAProxy, > instead it uses SIGINT -> SIGTERM -> SIGKILL. Thus the deinit() will > never trigger. > > Fixing this will require a change to VTest: > https://github.com/vtest/VTest/blob/b9e9e03fdeebd494783ae1dd8e6008f5c1e3a4bc/src/vtc_haproxy.c#L786 > needs to be SIGUSR1 (and the switch below adjusted). > > Making the change in my local repository and running the tests that > don't need any additional USE_XXX falgs causes 3 tests to fail with 2 of > them triggering an assertion within VTest. > > It would have caught the bug reported by William, though. > > Concluding: It probably makes sense to adjust VTest to use SIGUSR1 > instead of SIGINT, the latter is handled like SIGTERM in HAProxy anyway, > so there is no need for this distinction. I did not yet look into the > details of the failing tests, though. It *could* be a solution, I don't know if it may have other impacts. Actually we must always remember that while convenient, VTest's primary goal is to test a proxy by synchronizing the two sides (which is what basically no other testing tool can reliably do). If we want to run deeper tests on other process-oriented behaviors, it can make sense to rely on other tools. For example vtest is not the best suited to testing command line or process stability. It has no process management, can leak background processes and needs to wait for a timeout to detect a crash. My point above is that as long as *proxy* tests are compatible with stopping using SIGUSR1 I think I'd be fine with the change. But if doing this actually results in more pain to test the proxy features, I'd rather stay away from this and switch to a distinct test series for this. That's where I'd draw the line. Cheers, Willy
Re: [PATCH] BUG/MAJOR: Fix bogus free() during deinit() for http-request rules
On Sun, Jun 14, 2020 at 05:27:36PM +0200, Tim Duesterhus wrote: > We cannot simply `release_sample_expr(rule->arg.vars.expr)` for a > `struct act_rule`, because `rule->arg` is a union that might not > contain valid `vars`. This leads to a crash on a configuration using > `http-request redirect` and possibly others: > > frontend http > mode http > bind 127.0.0.1:80 > http-request redirect scheme https > > Instead a `struct act_rule` has a `release_ptr` that must be used > to properly free any additional storage allocated. (...) Thank you guys. I was hesitant on this one because I didn't remember that part at all and had to see how this was used for other actions, but after analysing the other ones, I agree that this is the correct way to release memory. Now applied, thank you! Willy
[PATCH] BUG/MINOR: systemd: Wait for network to be online
Change systemd service file to wait for network to be completely online. This solves two problems: If haproxy is configured to bind to IP address(es) that are not yet assigned, haproxy would previously fail. The workaround is to use "option transparent". If haproxy us configured to use a resolver to resolve servers via DNS, haproxy would previously fail due to the fact that the network is not fully online yet. This is the most compelling reason for this patch. Signed-off-by: Ryan O'Hara --- contrib/systemd/haproxy.service.in | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/systemd/haproxy.service.in b/contrib/systemd/haproxy.service.in index 9b7c3d1bb..05fc59579 100644 --- a/contrib/systemd/haproxy.service.in +++ b/contrib/systemd/haproxy.service.in @@ -1,6 +1,7 @@ [Unit] Description=HAProxy Load Balancer -After=network.target +After=network-online.target +Wants=network-online.target [Service] EnvironmentFile=-/etc/default/haproxy -- 2.25.1
Re: Broken SNI with crt-list for HAProxy 2.1.x after upgrade from Stretch to Buster
On Mon, Jun 15, 2020 at 03:48:40PM +0200, Tim Düsterhus wrote: > William, > > Am 15.06.20 um 14:56 schrieb William Lallemand: > > I think I found the problem, could you try the attached patch for 2.1? > > > > I'd prefer not, because I don't have a staging system where I could > easily reproduce the issue (and generating SSL certs to test this > properly is annoying). I was encountering the issue on a production box > and I don't like to mess around with manually compiled HAProxy versions > more than necessary. > No problem, I was able to reproduce the problem and confirmed that the crt-list is entirely parsed after a warning now. Thanks for the report, -- William Lallemand
Re: Broken SNI with crt-list for HAProxy 2.1.x after upgrade from Stretch to Buster
William, Am 15.06.20 um 14:56 schrieb William Lallemand: > I think I found the problem, could you try the attached patch for 2.1? > I'd prefer not, because I don't have a staging system where I could easily reproduce the issue (and generating SSL certs to test this properly is annoying). I was encountering the issue on a production box and I don't like to mess around with manually compiled HAProxy versions more than necessary. For me the issue is sufficiently resolved by specifying the DH bit size (and the fact that I plan to upgrade to 2.2 as soon as packages are available). I trust you that the patch fixes *a* bug, even if it might not be *my* bug, thus feel free to apply. Best regards Tim Düsterhus
Invitation to FDC October 2020 Training's
Re: Broken SNI with crt-list for HAProxy 2.1.x after upgrade from Stretch to Buster
On Sat, Jun 13, 2020 at 04:55:53PM +0200, Tim Düsterhus wrote: > William, > > Am 13.06.20 um 16:46 schrieb Tim Düsterhus: > > tune.ssl.default-dh-param 2048 solved the issue for me. > > > > I'd argue that this is a bug in HAProxy nonetheless, because apparently > > the crt-list file is not fully parsed in case of DH parameter warnings > > (not errors). In fact I can remember that some similar issue was > > previously fixed. > > > > Could this be another version of this issue: > https://github.com/haproxy/haproxy/issues/483? Should I file a bug report? > > Best regards > Tim Düsterhus Hello Tim, I think I found the problem, could you try the attached patch for 2.1? Thanks, -- William Lallemand >From 671197ebf116b053169d6a2ec27ded0b2d090f93 Mon Sep 17 00:00:00 2001 From: William Lallemand Date: Mon, 15 Jun 2020 14:37:19 +0200 Subject: [PATCH] BUG/MINOR: ssl: crt-list should continue parsing on ERR_WARN The original crt-list parsing was using any value in the cfgerr variable as an error. This is wrong since the certificate loading could return an ERR_WARN and should be able to be parsed. The parsing must be only stopped on an ERR_CODE. This commit is 2.1 only since it was fixed in 2.2 by commit 2954c47 ("MEDIUM: ssl: allow crt-list caching") and accidently in 2.0 by commit b131c87 ("CLEANUP: ssl: make ssl_sock_load_cert*() return real error codes"). --- src/ssl_sock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/ssl_sock.c b/src/ssl_sock.c index adf06dd7a..574cd15dd 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -4364,7 +4364,7 @@ int ssl_sock_load_cert_list_file(char *file, struct bind_conf *bind_conf, struct } line++; } - if (cfgerr) + if (cfgerr & ERR_CODE) break; args[arg++] = line; @@ -4409,7 +4409,7 @@ int ssl_sock_load_cert_list_file(char *file, struct bind_conf *bind_conf, struct } } - if (cfgerr) { + if (cfgerr & ERR_CODE) { ssl_sock_free_ssl_conf(ssl_conf); free(ssl_conf); ssl_conf = NULL; @@ -4428,7 +4428,7 @@ int ssl_sock_load_cert_list_file(char *file, struct bind_conf *bind_conf, struct else cfgerr |= ssl_sock_load_ckchs(crt_path, ckchs, bind_conf, ssl_conf, [cur_arg], arg - cur_arg - 1, err); - if (cfgerr) { + if (cfgerr & ERR_CODE) { memprintf(err, "error processing line %d in file '%s' : %s", linenum, file, *err); break; } -- 2.25.3