Re: [PATCH 0/4] Fix more frees

2020-06-15 Thread Willy Tarreau
Hi Tim,

On Tue, Jun 16, 2020 at 12:03:01AM +0200, Tim Duesterhus wrote:
> Willy,
> 
> this series fixes up a few more frees. This time I have verified the changes
> a bit more carefully, running configuration check on a real world 
> configuration
> of mine within valgrind. It still reports a five leaks (but less than without
> applying these patches!) and does not report any memory unsafety / bogus frees
> / double frees /  use-after-frees.
> 
> The first two patches fix leaks within helper functions. I triggered them
> during deinit, but they might or might not be accessible during normal
> operation (e.g. via the CLI socket). The last two patches are part of the
> deinit() function and thus cannot happen in normal operation, but cleaning
> them up makes it easier to find actual, important, issues.
> 
> Feel free to apply to 'next' only (or after the release of 2.2). No need to
> risk breaking anything shortly before release with something less important
> like this. And of course please carefully review them :-)

Thank you for this work. I appreciate how painful it is to track this
stuff down to the inner code. I'd indeed prefer to postpone its integration
for having been bitten many times. One specific case you need to take care
of is the situation where there is an error in one of the ACL patterns that
prevents the expression from being completed and that can result in some
elements not to be resolved. I typically don't care if we observe memleaks
in such a case since it dies in an error, but we don't want to see double
frees nor dereference of a wrong pointer. I don't remember all the corner
cases but they'll basically look like this :

   http-request deny if { src 12.12.12.12 13.13.13.13/33 }

(or anything that manages to cause a parsing error on a pattern after one
was properly resolved). This may also happen in other situations like when
an argument references a server/backend/stick-table that doesn't exist and
as such fails the config check, for example:

   http-request deny if { nbsrv(blah) gt 2 }

or possibly if that's one argument of a converter. I suspect that your
patches are OK because very likely release_sample_expr() was created
after the code paths you've changed. But it's indeed a bit tricky and
deep into the error path.

Do not hesitate to ping me after the release if I forget to get back to it.

Cheers,
Willy



[PATCH 1/4] BUG/MINOR: acl: Fix freeing of expr->smp in prune_acl_expr

2020-06-15 Thread Tim Duesterhus
Instead of simply calling free() in expr->smp->arg_p in certain cases
properly free the sample using release_sample_expr().

Given the following example configuration:

frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
acl is_match str(foo),strcmp(txn.hash) -m bool

Running a configuration check within valgrind reports:

==31371== 160 (48 direct, 112 indirect) bytes in 1 blocks are definitely 
lost in loss record 35 of 45
==31371==at 0x4C2FB55: calloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==by 0x4C3832: sample_parse_expr (sample.c:876)
==31371==by 0x56B3E0: parse_acl_expr (acl.c:319)
==31371==by 0x56BA4F: parse_acl (acl.c:697)
==31371==by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==31371==by 0x4797C3: readcfgfile (cfgparse.c:2167)
==31371==by 0x5293ED: init (haproxy.c:2021)
==31371==by 0x41F382: main (haproxy.c:3126)

After this patch this leak is reduced. It will be fully removed in a
follow up patch:

==32503== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43
==32503==at 0x4C2FB55: calloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32503==by 0x4C39B5: sample_parse_expr (sample.c:982)
==32503==by 0x56B410: parse_acl_expr (acl.c:319)
==32503==by 0x56BA7F: parse_acl (acl.c:697)
==32503==by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==32503==by 0x4797C3: readcfgfile (cfgparse.c:2167)
==32503==by 0x52943D: init (haproxy.c:2021)
==32503==by 0x41F382: main (haproxy.c:3133)

This is a fairly minor leak that can only be observed if ACLs need to be
freed, which is not something that should occur during normal processing
and most likely only during shut down. Thus no backport should be needed.
---
 src/acl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/acl.c b/src/acl.c
index 5fea5dcc2..333da22c9 100644
--- a/src/acl.c
+++ b/src/acl.c
@@ -117,8 +117,8 @@ static struct acl_expr *prune_acl_expr(struct acl_expr 
*expr)
}
}
 
-   if (expr->smp->arg_p != empty_arg_list && !unresolved)
-   free(expr->smp->arg_p);
+   release_sample_expr(expr->smp);
+   
return expr;
 }
 
-- 
2.27.0




[PATCH 4/4] BUG/MINOR: haproxy: Add missing free of server->(hostname|resolvers_id)

2020-06-15 Thread Tim Duesterhus
Given the following example configuration:

resolvers test
nameserver test 127.0.0.1:53
listen foo
bind *:8080
server foo example.com resolvers test

Running a configuration check within valgrind reports:

==21995== 5 bytes in 1 blocks are definitely lost in loss record 1 of 30
==21995==at 0x4C2DB8F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21995==by 0x5726489: strdup (strdup.c:42)
==21995==by 0x4B2CFB: parse_server (server.c:2163)
==21995==by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534)
==21995==by 0x459E33: readcfgfile (cfgparse.c:2167)
==21995==by 0x50778D: init (haproxy.c:2021)
==21995==by 0x418262: main (haproxy.c:3133)
==21995==
==21995== 12 bytes in 1 blocks are definitely lost in loss record 3 of 30
==21995==at 0x4C2DB8F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21995==by 0x5726489: strdup (strdup.c:42)
==21995==by 0x4AC666: srv_prepare_for_resolution (server.c:1606)
==21995==by 0x4B2EBD: parse_server (server.c:2081)
==21995==by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534)
==21995==by 0x459E33: readcfgfile (cfgparse.c:2167)
==21995==by 0x50778D: init (haproxy.c:2021)
==21995==by 0x418262: main (haproxy.c:3133)

with one more leak unrelated to `struct server`. After applying this
patch the leak is gone as expected.

This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
---
 src/haproxy.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/haproxy.c b/src/haproxy.c
index 818cf3504..52fca84b5 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2743,12 +2743,14 @@ void deinit(void)
 
free(s->id);
free(s->cookie);
+   free(s->hostname);
free(s->hostname_dn);
free((char*)s->conf.file);
free(s->idle_conns);
free(s->safe_conns);
free(s->available_conns);
free(s->curr_idle_thr);
+   free(s->resolvers_id);
 
if (s->use_ssl == 1 || s->check.use_ssl == 1 || 
(s->proxy->options & PR_O_TCPCHK_SSL)) {
if (xprt_get(XPRT_SSL) && 
xprt_get(XPRT_SSL)->destroy_srv)
-- 
2.27.0




[PATCH 3/4] BUG/MINOR: haproxy: Free proxy->format_unique_id during deinit

2020-06-15 Thread Tim Duesterhus
Given the following example configuration:

frontend foo
mode http
bind *:8080
unique-id-format x

Running a configuration check with valgrind reports:

==30712== 42 (40 direct, 2 indirect) bytes in 1 blocks are definitely lost 
in loss record 18 of 39
==30712==at 0x4C2FB55: calloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30712==by 0x4ED7E9: add_to_logformat_list (log.c:462)
==30712==by 0x4EEE28: parse_logformat_string (log.c:720)
==30712==by 0x47B09A: check_config_validity (cfgparse.c:3046)
==30712==by 0x52881D: init (haproxy.c:2121)
==30712==by 0x41F382: main (haproxy.c:3126)

After this patch is applied the leak is gone as expected.

This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
---
 src/haproxy.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/haproxy.c b/src/haproxy.c
index 215c68a30..818cf3504 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2698,6 +2698,13 @@ void deinit(void)
free(lf);
}
 
+   list_for_each_entry_safe(lf, lfb, >format_unique_id, list) {
+   LIST_DEL(>list);
+   release_sample_expr(lf->expr);
+   free(lf->arg);
+   free(lf);
+   }
+
deinit_act_rules(>tcp_req.inspect_rules);
deinit_act_rules(>tcp_rep.inspect_rules);
deinit_act_rules(>tcp_req.l4_rules);
-- 
2.27.0




[PATCH 0/4] Fix more frees

2020-06-15 Thread Tim Duesterhus
Willy,

this series fixes up a few more frees. This time I have verified the changes
a bit more carefully, running configuration check on a real world configuration
of mine within valgrind. It still reports a five leaks (but less than without
applying these patches!) and does not report any memory unsafety / bogus frees
/ double frees /  use-after-frees.

The first two patches fix leaks within helper functions. I triggered them
during deinit, but they might or might not be accessible during normal
operation (e.g. via the CLI socket). The last two patches are part of the
deinit() function and thus cannot happen in normal operation, but cleaning
them up makes it easier to find actual, important, issues.

Feel free to apply to 'next' only (or after the release of 2.2). No need to
risk breaking anything shortly before release with something less important
like this. And of course please carefully review them :-)

Best regards

Tim Düsterhus (4):
  BUG/MINOR: acl: Fix freeing of expr->smp in prune_acl_expr
  BUG/MINOR: sample: Fix freeing of conv_exprs in release_sample_expr
  BUG/MINOR: haproxy: Free proxy->format_unique_id during deinit
  BUG/MINOR: haproxy: Add missing free of
server->(hostname|resolvers_id)

 src/acl.c | 4 ++--
 src/haproxy.c | 9 +
 src/sample.c  | 6 +-
 3 files changed, 16 insertions(+), 3 deletions(-)

-- 
2.27.0




[PATCH 2/4] BUG/MINOR: sample: Fix freeing of conv_exprs in release_sample_expr

2020-06-15 Thread Tim Duesterhus
Instead of just calling release_sample_arg(conv_expr->arg_p) we also must
free() the conv_expr itself (after removing it from the list).

Given the following example configuration:

frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
acl is_match str(foo),strcmp(txn.hash) -m bool

Running a configuration check within valgrind reports:

==1431== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43
==1431==at 0x4C2FB55: calloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1431==by 0x4C39B5: sample_parse_expr (sample.c:982)
==1431==by 0x56B410: parse_acl_expr (acl.c:319)
==1431==by 0x56BA7F: parse_acl (acl.c:697)
==1431==by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==1431==by 0x4797C3: readcfgfile (cfgparse.c:2167)
==1431==by 0x52943D: init (haproxy.c:2021)
==1431==by 0x41F382: main (haproxy.c:3133)

After this patch is applied the leak is gone as expected.

This is a fairly minor leak that can only be observed if samples need to be
freed, which is not something that should occur during normal processing and
most likely only during shut down. Thus no backport should be needed.
---
 src/sample.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/sample.c b/src/sample.c
index faf6471c0..a6633e849 100644
--- a/src/sample.c
+++ b/src/sample.c
@@ -1424,8 +1424,12 @@ void release_sample_expr(struct sample_expr *expr)
if (!expr)
return;
 
-   list_for_each_entry_safe(conv_expr, conv_exprb, >conv_exprs, list)
+   list_for_each_entry_safe(conv_expr, conv_exprb, >conv_exprs, 
list) {
+   LIST_DEL(_expr->list);
release_sample_arg(conv_expr->arg_p);
+   free(conv_expr);
+   }
+
release_sample_arg(expr->arg_p);
free(expr);
 }
-- 
2.27.0




Re: [PATCH] BUG/MINOR: systemd: Wait for network to be online

2020-06-15 Thread Ryan O'Hara
I posted this patch to start some discussion here. I'm not the first to
notice this problem but I was, until now, hesitant to change the systemd
service file until now. The reason for this was that waiting for
network-online.target could delay boot time. Please see systemd network
target docs here [1].

As stated in the commit message, the common reason I was asked to change
this in RHEL/Fedora was due to attempting to bind to a non-existent IP
address, but that can be overcome with 'option transparent'. However I
recently was notified that DNS resolution will fail when haproxy starts if
the network is not fully online. Thus I suggested the patch.

I also found this discussion [2], but noticed that the upstream service
file had not been modified.

Cheers,
Ryan

[1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
[2] https://discourse.haproxy.org/t/haproxy-fails-on-restart/3469/10

On Mon, Jun 15, 2020 at 12:03 PM Ryan O'Hara  wrote:

> Change systemd service file to wait for network to be completely
> online. This solves two problems:
>
> If haproxy is configured to bind to IP address(es) that are not yet
> assigned, haproxy would previously fail. The workaround is to use
> "option transparent".
>
> If haproxy us configured to use a resolver to resolve servers via DNS,
> haproxy would previously fail due to the fact that the network is not
> fully online yet. This is the most compelling reason for this patch.
>
> Signed-off-by: Ryan O'Hara 
> ---
>  contrib/systemd/haproxy.service.in | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/systemd/haproxy.service.in b/contrib/systemd/
> haproxy.service.in
> index 9b7c3d1bb..05fc59579 100644
> --- a/contrib/systemd/haproxy.service.in
> +++ b/contrib/systemd/haproxy.service.in
> @@ -1,6 +1,7 @@
>  [Unit]
>  Description=HAProxy Load Balancer
> -After=network.target
> +After=network-online.target
> +Wants=network-online.target
>
>  [Service]
>  EnvironmentFile=-/etc/default/haproxy
> --
> 2.25.1
>
>
>


Re: VTest does not test deinit

2020-06-15 Thread Willy Tarreau
Hi Tim,

On Sun, Jun 14, 2020 at 06:24:19PM +0200, Tim Düsterhus wrote:
> Hi List,
> Willy,
> Ilya,
> 
> I noticed that the reg-tests were unable find the issue reported by
> William here:
> https://www.mail-archive.com/haproxy@formilux.org/msg37637.html
> 
> This is because VTest never performs a "soft" shutdown of HAProxy,
> instead it uses SIGINT -> SIGTERM -> SIGKILL. Thus the deinit() will
> never trigger.
> 
> Fixing this will require a change to VTest:
> https://github.com/vtest/VTest/blob/b9e9e03fdeebd494783ae1dd8e6008f5c1e3a4bc/src/vtc_haproxy.c#L786
> needs to be SIGUSR1 (and the switch below adjusted).
> 
> Making the change in my local repository and running the tests that
> don't need any additional USE_XXX falgs causes 3 tests to fail with 2 of
> them triggering an assertion within VTest.
> 
> It would have caught the bug reported by William, though.
> 
> Concluding: It probably makes sense to adjust VTest to use SIGUSR1
> instead of SIGINT, the latter is handled like SIGTERM in HAProxy anyway,
> so there is no need for this distinction. I did not yet look into the
> details of the failing tests, though.

It *could* be a solution, I don't know if it may have other impacts.

Actually we must always remember that while convenient, VTest's
primary goal is to test a proxy by synchronizing the two sides (which
is what basically no other testing tool can reliably do). If we want
to run deeper tests on other process-oriented behaviors, it can make
sense to rely on other tools. For example vtest is not the best suited
to testing command line or process stability. It has no process
management, can leak background processes and needs to wait for a
timeout to detect a crash.

My point above is that as long as *proxy* tests are compatible with
stopping using SIGUSR1 I think I'd be fine with the change. But if
doing this actually results in more pain to test the proxy features,
I'd rather stay away from this and switch to a distinct test series
for this. That's where I'd draw the line.

Cheers,
Willy



Re: [PATCH] BUG/MAJOR: Fix bogus free() during deinit() for http-request rules

2020-06-15 Thread Willy Tarreau
On Sun, Jun 14, 2020 at 05:27:36PM +0200, Tim Duesterhus wrote:
> We cannot simply `release_sample_expr(rule->arg.vars.expr)` for a
> `struct act_rule`, because `rule->arg` is a union that might not
> contain valid `vars`. This leads to a crash on a configuration using
> `http-request redirect` and possibly others:
> 
> frontend http
>   mode http
>   bind 127.0.0.1:80
>   http-request redirect scheme https
> 
> Instead a `struct act_rule` has a `release_ptr` that must be used
> to properly free any additional storage allocated.
(...)

Thank you guys. I was hesitant on this one because I didn't remember
that part at all and had to see how this was used for other actions,
but after analysing the other ones, I agree that this is the correct
way to release memory.

Now applied, thank you!
Willy



[PATCH] BUG/MINOR: systemd: Wait for network to be online

2020-06-15 Thread Ryan O'Hara
Change systemd service file to wait for network to be completely
online. This solves two problems:

If haproxy is configured to bind to IP address(es) that are not yet
assigned, haproxy would previously fail. The workaround is to use
"option transparent".

If haproxy us configured to use a resolver to resolve servers via DNS,
haproxy would previously fail due to the fact that the network is not
fully online yet. This is the most compelling reason for this patch.

Signed-off-by: Ryan O'Hara 
---
 contrib/systemd/haproxy.service.in | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/systemd/haproxy.service.in 
b/contrib/systemd/haproxy.service.in
index 9b7c3d1bb..05fc59579 100644
--- a/contrib/systemd/haproxy.service.in
+++ b/contrib/systemd/haproxy.service.in
@@ -1,6 +1,7 @@
 [Unit]
 Description=HAProxy Load Balancer
-After=network.target
+After=network-online.target
+Wants=network-online.target
 
 [Service]
 EnvironmentFile=-/etc/default/haproxy
-- 
2.25.1




Re: Broken SNI with crt-list for HAProxy 2.1.x after upgrade from Stretch to Buster

2020-06-15 Thread William Lallemand
On Mon, Jun 15, 2020 at 03:48:40PM +0200, Tim Düsterhus wrote:
> William,
> 
> Am 15.06.20 um 14:56 schrieb William Lallemand:
> > I think I found the problem, could you try the attached patch for 2.1?
> > 
> 
> I'd prefer not, because I don't have a staging system where I could
> easily reproduce the issue (and generating SSL certs to test this
> properly is annoying). I was encountering the issue on a production box
> and I don't like to mess around with manually compiled HAProxy versions
> more than necessary.
> 

No problem, I was able to reproduce the problem and confirmed that the
crt-list is entirely parsed after a warning now.

Thanks for the report,

-- 
William Lallemand



Re: Broken SNI with crt-list for HAProxy 2.1.x after upgrade from Stretch to Buster

2020-06-15 Thread Tim Düsterhus
William,

Am 15.06.20 um 14:56 schrieb William Lallemand:
> I think I found the problem, could you try the attached patch for 2.1?
> 

I'd prefer not, because I don't have a staging system where I could
easily reproduce the issue (and generating SSL certs to test this
properly is annoying). I was encountering the issue on a production box
and I don't like to mess around with manually compiled HAProxy versions
more than necessary.

For me the issue is sufficiently resolved by specifying the DH bit size
(and the fact that I plan to upgrade to 2.2 as soon as packages are
available).

I trust you that the patch fixes *a* bug, even if it might not be *my*
bug, thus feel free to apply.

Best regards
Tim Düsterhus



Invitation to FDC October 2020 Training's

2020-06-15 Thread Foscore Development Center



Re: Broken SNI with crt-list for HAProxy 2.1.x after upgrade from Stretch to Buster

2020-06-15 Thread William Lallemand
On Sat, Jun 13, 2020 at 04:55:53PM +0200, Tim Düsterhus wrote:
> William,
> 
> Am 13.06.20 um 16:46 schrieb Tim Düsterhus:
> > tune.ssl.default-dh-param 2048 solved the issue for me.
> > 
> > I'd argue that this is a bug in HAProxy nonetheless, because apparently
> > the crt-list file is not fully parsed in case of DH parameter warnings
> > (not errors). In fact I can remember that some similar issue was
> > previously fixed.
> > 
> 
> Could this be another version of this issue:
> https://github.com/haproxy/haproxy/issues/483? Should I file a bug report?
> 
> Best regards
> Tim Düsterhus

Hello Tim,

I think I found the problem, could you try the attached patch for 2.1?

Thanks,

-- 
William Lallemand
>From 671197ebf116b053169d6a2ec27ded0b2d090f93 Mon Sep 17 00:00:00 2001
From: William Lallemand 
Date: Mon, 15 Jun 2020 14:37:19 +0200
Subject: [PATCH] BUG/MINOR: ssl: crt-list should continue parsing on ERR_WARN

The original crt-list parsing was using any value in the cfgerr variable
as an error. This is wrong since the certificate loading could return
an ERR_WARN and should be able to be parsed. The parsing must be only
stopped on an ERR_CODE.

This commit is 2.1 only since it was fixed
in 2.2 by commit 2954c47 ("MEDIUM: ssl: allow crt-list caching")
and accidently in 2.0 by commit b131c87 ("CLEANUP: ssl: make
ssl_sock_load_cert*() return real error codes").
---
 src/ssl_sock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index adf06dd7a..574cd15dd 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -4364,7 +4364,7 @@ int ssl_sock_load_cert_list_file(char *file, struct bind_conf *bind_conf, struct
 			}
 			line++;
 		}
-		if (cfgerr)
+		if (cfgerr & ERR_CODE)
 			break;
 		args[arg++] = line;
 
@@ -4409,7 +4409,7 @@ int ssl_sock_load_cert_list_file(char *file, struct bind_conf *bind_conf, struct
 			}
 		}
 
-		if (cfgerr) {
+		if (cfgerr & ERR_CODE) {
 			ssl_sock_free_ssl_conf(ssl_conf);
 			free(ssl_conf);
 			ssl_conf = NULL;
@@ -4428,7 +4428,7 @@ int ssl_sock_load_cert_list_file(char *file, struct bind_conf *bind_conf, struct
 		else
 			cfgerr |= ssl_sock_load_ckchs(crt_path, ckchs, bind_conf, ssl_conf, [cur_arg], arg - cur_arg - 1, err);
 
-		if (cfgerr) {
+		if (cfgerr & ERR_CODE) {
 			memprintf(err, "error processing line %d in file '%s' : %s", linenum, file, *err);
 			break;
 		}
-- 
2.25.3