Re: [PATCH] BUG/MAJOR: segfault during tcpcheck_main

2019-03-29 Thread Willy Tarreau
Hi Ricardo,

On Thu, Mar 28, 2019 at 10:15:41PM -0300, Ricardo Nabinger Sanchez wrote:
> Hello,
> 
> We have been chasing a segfault for a few weeks and today we were able
> to track it down.  There is a null-pointer dereferencing when using
> tcp-check connect; although we don't know yet as to why the protocol
> family might be unknown during tcpcheck_main().

Many thanks for this fix! I think I'm seeing how this can happen, it's
probably since the introduction of the DNS to set addresses on servers.
In the past a server always had an address, but now a server may also
not have an address at all. It's unclear to me exactly how a check could
be scheduled on an addressless server but I think it could happen if the
check is already scheduled and the server loses its address just before
the check actually runs.

I'm taking it just before issuing 1.9.6, thanks!
Willy



[PATCH] BUG/MAJOR: segfault during tcpcheck_main

2019-03-28 Thread Ricardo Nabinger Sanchez
Hello,

We have been chasing a segfault for a few weeks and today we were able
to track it down.  There is a null-pointer dereferencing when using
tcp-check connect; although we don't know yet as to why the protocol
family might be unknown during tcpcheck_main().

On some of our servers, HAProxy was crashing a few times per day,
without any clear pattern, while there were no crashes on other servers
running the same version/configuration.

We manually checked every use of protocol_by_family() and only the one
in tcpcheck_main() was unprotected.  To comply with the coding
standards, we used the same behavior found in another function in
src/checks.c, which also calls protocol_by_family but that function
tests for a NULL pointer before dereferencing it.

Best regards,

-- 
Ricardo Nabinger Sanchez http://www.taghos.com.br/
>From 29684452b9cf923e321537ffc1f4694c88131f4b Mon Sep 17 00:00:00 2001
From: Ricardo Nabinger Sanchez 
Date: Thu, 28 Mar 2019 21:42:23 -0300
Subject: [PATCH] BUG/MAJOR: segfault during tcpcheck_main

When using TCP health checks (tcp-check connect), it is possible to
crash with a segfault when, for reasons yet to be understood, the
protocol family is unknown.

In the function tcpcheck_main(), proto is dereferenced without a prior
test in case it is NULL, leading to the segfault during proto->connect
dereference.

The line has been unmodified since it was introduced, in commit
69e273f3fcfbfb9cc0fb5a09668faad66cfbd36b.  This was the only use of
proto (or more specifically, the return of  protocol_by_family()) that
was unprotected; all other callsites perform the test for a NULL
pointer.

This patch should be backported to 1.9, 1.8, 1.7, and 1.6.
---
 src/checks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/checks.c b/src/checks.c
index 35744c6b..31004ddf 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2839,7 +2839,7 @@ static int tcpcheck_main(struct check *check)
 			cs_attach(cs, check, _conn_cb);
 
 			ret = SF_ERR_INTERNAL;
-			if (proto->connect)
+			if (proto && proto->connect)
 ret = proto->connect(conn,
 		 1 /* I/O polling is always needed */,
 		 (next && next->action == TCPCHK_ACT_EXPECT) ? 0 : 2);
-- 
2.21.0