Re: HAProxy 1.8.X crashing

2018-04-13 Thread Willy Tarreau
Hi Praveen,

On Fri, Apr 13, 2018 at 02:03:47PM +, UPPALAPATI, PRAVEEN wrote:
> Hi Oliver,
> 
> The crash got fixed with the patch you provided before. 

Thanks for this useful feedback!

> Do you thing the latest patch will be the right solution?

It should, and if it doesn't, it means we have other bugs at other places
that could still strike on rare situations (eg: memory shortage), so that's
why I really want to ensure that we make the code more robust against this
case. The muxes are a very new thing introduced into haproxy, I'm not that
much surprized that we got a few such inconsistencies, but in general since
they affect everything, any such bug there has a very high risk of being
triggered, which is why I'm confident enough for the long term : if you
don't get it anymore, it's really likely that it's fixed.

Thanks,
Willy



Re: HAProxy 1.8.X crashing

2018-04-13 Thread Olivier Houchard
Hi Praveen,

On Fri, Apr 13, 2018 at 02:03:47PM +, UPPALAPATI, PRAVEEN wrote:
> Hi Oliver,
> 
> The crash got fixed with the patch you provided before. 
> 
> Do you thing the latest patch will be the right solution?
> 
> Thanks,
> Praveen.
> 

It should be fine.
Regards,

Olivier



RE: HAProxy 1.8.X crashing

2018-04-13 Thread UPPALAPATI, PRAVEEN
Hi Oliver,

The crash got fixed with the patch you provided before. 

Do you thing the latest patch will be the right solution?

Thanks,
Praveen.

-Original Message-
From: Olivier Houchard [mailto:ohouch...@haproxy.com] 
Sent: Friday, April 13, 2018 8:57 AM
To: Willy Tarreau <w...@1wt.eu>
Cc: UPPALAPATI, PRAVEEN <pu5...@att.com>; haproxy@formilux.org
Subject: Re: HAProxy 1.8.X crashing

Hi,

On Thu, Apr 12, 2018 at 11:33:43PM +0200, Willy Tarreau wrote:
> That may be the problem. But if a mux fails to initialize, then we can't
> destroy them either ? Is cs_destroy() the problem here maybe ? If so, I
> suspect that this approach would be more robust and would better match
> the validity tests on conn->mux that are performed at many places in the
> connection code :
> 
>   if (cs->conn->mux) {
>   cs->conn->mux->detach(cs);
>   } else {
> conn_stop_tracking(cs->conn);
> conn_xprt_close(cs->conn);
> conn_free(cs->conn);
> }
> cs_free(cs);
> 
> > >   - one above the return SF_ERR_INTERNAL above in connect_server() to
> > > handle the case where the connection already exists but not the mux.
> > 
> > I'm not sure how this can happen, and what we should do if that happens.
> 
> At least in the current situation we pre-create a connection just to start
> to fill some information (source/destination address), without knowing the
> mux yet. It's an initialization process, it may look beautiful or ugly
> depending on the perspective, but in any case we must ensure that the
> code overall remains reliable in this situation. And to me it looks less
> ugly than picking a possibly wrong mux just to have a valid pointer
> avoiding a segfault.
> 
> But if it turns out that stabilizing the destroy path (or error path) is
> not sufficient and that it breaks somewhere else, we can well adopt this
> solution for the short term and see how to address it for the long term;
> we already know that the outgoing connection code is severely limited to
> the point of currently preventing us from using H2 on the backend, and
> that's exactly why we're currently working on it.

Ok, here is a patch that does exactly what you suggest. I'm not entirely
happy with it, but it'll do the job, as a stopgap. I want this crash
fixed :)

Olivier



Re: HAProxy 1.8.X crashing

2018-04-13 Thread Willy Tarreau
On Fri, Apr 13, 2018 at 03:56:31PM +0200, Olivier Houchard wrote:
> Ok, here is a patch that does exactly what you suggest. I'm not entirely
> happy with it, but it'll do the job, as a stopgap. I want this crash
> fixed :)

Great, now merged, thank you! If you don't mind, I changed the tag to
BUG/MEDIUM since the bug is causing a crash. And if you mind it's too
late :-)

Willy



Re: HAProxy 1.8.X crashing

2018-04-13 Thread Olivier Houchard
Hi,

On Thu, Apr 12, 2018 at 11:33:43PM +0200, Willy Tarreau wrote:
> That may be the problem. But if a mux fails to initialize, then we can't
> destroy them either ? Is cs_destroy() the problem here maybe ? If so, I
> suspect that this approach would be more robust and would better match
> the validity tests on conn->mux that are performed at many places in the
> connection code :
> 
>   if (cs->conn->mux) {
>   cs->conn->mux->detach(cs);
>   } else {
> conn_stop_tracking(cs->conn);
> conn_xprt_close(cs->conn);
> conn_free(cs->conn);
> }
> cs_free(cs);
> 
> > >   - one above the return SF_ERR_INTERNAL above in connect_server() to
> > > handle the case where the connection already exists but not the mux.
> > 
> > I'm not sure how this can happen, and what we should do if that happens.
> 
> At least in the current situation we pre-create a connection just to start
> to fill some information (source/destination address), without knowing the
> mux yet. It's an initialization process, it may look beautiful or ugly
> depending on the perspective, but in any case we must ensure that the
> code overall remains reliable in this situation. And to me it looks less
> ugly than picking a possibly wrong mux just to have a valid pointer
> avoiding a segfault.
> 
> But if it turns out that stabilizing the destroy path (or error path) is
> not sufficient and that it breaks somewhere else, we can well adopt this
> solution for the short term and see how to address it for the long term;
> we already know that the outgoing connection code is severely limited to
> the point of currently preventing us from using H2 on the backend, and
> that's exactly why we're currently working on it.

Ok, here is a patch that does exactly what you suggest. I'm not entirely
happy with it, but it'll do the job, as a stopgap. I want this crash
fixed :)

Olivier
>From 84f50217a88b2c8bb5152045e912a0e39f12d204 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 13 Apr 2018 15:50:27 +0200
Subject: [PATCH] BUG/MINOR: connection: Make sure we have a mux before calling
 detach().

In some cases, we call cs_destroy() very early, so early the connection
doesn't yet have a mux, so we can't call mux->detach(). In this case,
just destroy the associated connection.

This should be backported to 1.8.
---
 include/proto/connection.h | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/proto/connection.h b/include/proto/connection.h
index bc8d88484..8566736fd 100644
--- a/include/proto/connection.h
+++ b/include/proto/connection.h
@@ -699,7 +699,20 @@ static inline void conn_free(struct connection *conn)
 /* Release a conn_stream, and kill the connection if it was the last one */
 static inline void cs_destroy(struct conn_stream *cs)
 {
-   cs->conn->mux->detach(cs);
+   if (cs->conn->mux)
+   cs->conn->mux->detach(cs);
+   else {
+   /* It's too early to have a mux, let's just destroy
+* the connection
+*/
+   struct connection *conn = cs->conn;
+
+   conn_stop_tracking(conn);
+   conn_full_close(conn);
+   if (conn->destroy_cb)
+   conn->destroy_cb(conn);
+   conn_free(conn);
+   }
cs_free(cs);
 }
 
-- 
2.14.3



Re: HAProxy 1.8.X crashing

2018-04-12 Thread Willy Tarreau
On Thu, Apr 12, 2018 at 01:14:58PM +0200, Olivier Houchard wrote:
> > > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct 
> > > channel *req, int an_bit)
> > >  
> > >   return 0;
> > >   }
> > > + /* XXX: We probably need a better mux */
> > > + conn_install_mux(conn, _pt_ops, objt_cs(s->si[1].end));
> > >  
> > >   path = http_get_path(txn);
> > >   url2sa(req->buf->p + msg->sl.rq.u,
> > 
> > While I can understand how missing this can have caused trouble, I'm not
> > sure it's the best solution, and I'm a bit worried about having various
> > code places choose a mux and install it. I suspect we're "simply" missing
> > a test in the backend code to cover the case where the connection already
> > exists but a mux was not yet installed (a situation that didn't exist
> > prior to muxes which is why we didn't notice this). I think that this
> > happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
> > when conn_xprt_ready() indicates the transport was not yet initialized.
> > It's likely that the error unrolling fails to consider the case where
> > the mux wasn't initialized, leading to the crash Praveen experienced.
> > 
> 
> I'm not sure about this. My point was to try to install the mux when we can
> choose one, the day we have more relevant muxes, if it ever happens.
> This certainly fixes Praveen's crash.

I totally understand, I just claim that I think we're likely buying a
carpet to put it on the dust instead of removing the dust. It's more of
a conceptual issue : you're forced to select an arbitrary mux at a moment
where you have no idea which one will be needed for this outgoing connection.
In 1.9 it could be H2, later it could be FCGI, etc. And generally when we
have to fill arbitrary information, we pay it the high price later because
it creates unnatural bonds between some unrelated elements.

The single fact that we're forced to pick an arbitrary mux to avoid a crash
somewhere else tells me we have an issue. If one specific code path doesn't
support being called with a muxless connection, this same code path might
also be triggered when the malloc() to create the mux fails and we want to
gracefully abort and try to close this muxless connection after sending a
500 Server Error status.

Also, the only reason why we create the connection this early is that we
need a place where to put the IP addresses, and given that connections
are small (only very slightly larger than the addresses they convey), it
totally makes sense to pre-populate the address there. If we start to add
more complex muxes, we'll even possibly end up allocating more resources
(think h2+hpack contexts), trying to connect and to initiate a handshake
to a target before the rest is finished or whatever funny stuff.

Now don't get me wrong, it would very well be that for now, given the
current state of the code, we're missing many checks and this solution
will be an efficient short-term fix. But for the reasons above it doesn't
sound logical to me. Maybe we're simply missing an "if (conn->mux)" or
"if (cs->mux)" somewhere on the free path before calling something for
the mux because it was assumed that it did obvously exist.

> I don't even know how we could reach line 1177.

I'm uncertain to be honest, I just think that there is a small possibility
if the connection exists and not the transport, but I could be wrong.
> 
> > If this is right, it would mean two fixes :
> >   - one in the error unrolling path to ensure we don't destroy a non
> > allocated mux ;
> 
> And what do we do then ? Without the mux we can't destroy the cs nor the
> connection.

That may be the problem. But if a mux fails to initialize, then we can't
destroy them either ? Is cs_destroy() the problem here maybe ? If so, I
suspect that this approach would be more robust and would better match
the validity tests on conn->mux that are performed at many places in the
connection code :

if (cs->conn->mux) {
cs->conn->mux->detach(cs);
} else {
conn_stop_tracking(cs->conn);
conn_xprt_close(cs->conn);
conn_free(cs->conn);
}
cs_free(cs);

> >   - one above the return SF_ERR_INTERNAL above in connect_server() to
> > handle the case where the connection already exists but not the mux.
> 
> I'm not sure how this can happen, and what we should do if that happens.

At least in the current situation we pre-create a connection just to start
to fill some information (source/destination address), without knowing the
mux yet. It's an initialization process, it may look beautiful or ugly
depending on the perspective, but in any case we must ensure that the
code overall remains reliable in this situation. And to me it looks less
ugly than picking a possibly wrong mux just to have a valid pointer
avoiding a segfault.

But if it turns out that stabilizing the destroy path (or error path) is
not sufficient 

Re: HAProxy 1.8.X crashing

2018-04-12 Thread Olivier Houchard
Hi Willy,

On Thu, Apr 12, 2018 at 08:53:51AM +0200, Willy Tarreau wrote:
> Hi Olivier,
> 
> On Wed, Apr 11, 2018 at 05:29:15PM +0200, Olivier Houchard wrote:
> > From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001
> > From: Olivier Houchard 
> > Date: Wed, 11 Apr 2018 17:23:17 +0200
> > Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode.
> > 
> > We were allocating a new connection when in proxy mode, but did not provide
> > it a mux, thus crashing later.
> > 
> > This should be backported to 1.8.
> > ---
> >  src/proto_http.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/src/proto_http.c b/src/proto_http.c
> > index 80e001d69..817692c48 100644
> > --- a/src/proto_http.c
> > +++ b/src/proto_http.c
> > @@ -62,6 +62,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct 
> > channel *req, int an_bit)
> >  
> > return 0;
> > }
> > +   /* XXX: We probably need a better mux */
> > +   conn_install_mux(conn, _pt_ops, objt_cs(s->si[1].end));
> >  
> > path = http_get_path(txn);
> > url2sa(req->buf->p + msg->sl.rq.u,
> 
> While I can understand how missing this can have caused trouble, I'm not
> sure it's the best solution, and I'm a bit worried about having various
> code places choose a mux and install it. I suspect we're "simply" missing
> a test in the backend code to cover the case where the connection already
> exists but a mux was not yet installed (a situation that didn't exist
> prior to muxes which is why we didn't notice this). I think that this
> happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
> when conn_xprt_ready() indicates the transport was not yet initialized.
> It's likely that the error unrolling fails to consider the case where
> the mux wasn't initialized, leading to the crash Praveen experienced.
> 

I'm not sure about this. My point was to try to install the mux when we can
choose one, the day we have more relevant muxes, if it ever happens.
This certainly fixes Praveen's crash.
I don't even know how we could reach line 1177.

> If this is right, it would mean two fixes :
>   - one in the error unrolling path to ensure we don't destroy a non
> allocated mux ;

And what do we do then ? Without the mux we can't destroy the cs nor the
connection.

>   - one above the return SF_ERR_INTERNAL above in connect_server() to
> handle the case where the connection already exists but not the mux.

I'm not sure how this can happen, and what we should do if that happens.

Regards,

Olivier



Re: HAProxy 1.8.X crashing

2018-04-12 Thread Willy Tarreau
Hi Olivier,

On Wed, Apr 11, 2018 at 05:29:15PM +0200, Olivier Houchard wrote:
> From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001
> From: Olivier Houchard 
> Date: Wed, 11 Apr 2018 17:23:17 +0200
> Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode.
> 
> We were allocating a new connection when in proxy mode, but did not provide
> it a mux, thus crashing later.
> 
> This should be backported to 1.8.
> ---
>  src/proto_http.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/proto_http.c b/src/proto_http.c
> index 80e001d69..817692c48 100644
> --- a/src/proto_http.c
> +++ b/src/proto_http.c
> @@ -62,6 +62,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct 
> channel *req, int an_bit)
>  
>   return 0;
>   }
> + /* XXX: We probably need a better mux */
> + conn_install_mux(conn, _pt_ops, objt_cs(s->si[1].end));
>  
>   path = http_get_path(txn);
>   url2sa(req->buf->p + msg->sl.rq.u,

While I can understand how missing this can have caused trouble, I'm not
sure it's the best solution, and I'm a bit worried about having various
code places choose a mux and install it. I suspect we're "simply" missing
a test in the backend code to cover the case where the connection already
exists but a mux was not yet installed (a situation that didn't exist
prior to muxes which is why we didn't notice this). I think that this
happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
when conn_xprt_ready() indicates the transport was not yet initialized.
It's likely that the error unrolling fails to consider the case where
the mux wasn't initialized, leading to the crash Praveen experienced.

If this is right, it would mean two fixes :
  - one in the error unrolling path to ensure we don't destroy a non
allocated mux ;
  - one above the return SF_ERR_INTERNAL above in connect_server() to
handle the case where the connection already exists but not the mux.

What do you think ?

Willy



Re: HAProxy 1.8.X crashing

2018-04-11 Thread Olivier Houchard
Hi Praveen,

On Wed, Apr 11, 2018 at 02:16:28PM +, UPPALAPATI, PRAVEEN wrote:
> Hi Haproxy-Team,
> 
> I tried compiling different minor versions of 1.8.x releases and all the 
> minor versions are crashing whe trying to use option http-proxy:
> 
> Configuration that is causing issue:
> 
> listen http_proxy-
> bind *:9876
> mode http
> option httplog
> http-request set-uri 
> http://%[url_param(idnsredirHost)]%[capture.req.uri]
> option http_proxy
> 
> If I don't use option http_proxy things work normally. Following is from the 
> core dump:
> 
> : #0 0x00454839 in cs_destroy (cs=0x207edd0) at 
> include/proto/connection.h:704
> 704 cs->conn->mux->detach(cs);
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 
> libcom_err-1.42.9-10.el7.x 86_64 libselinux-2.5-11.el7.x86_64 
> openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 
> zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x00454839 in cs_destroy (cs=0x207edd0) at 
> include/proto/connection.h:704
> #1 si_release_endpoint (si=0x2083540) at include/proto/stream_interface.h:162
> #2 stream_free (s=0x20832e0) at src/stream.c:398
> #3 process_stream (t=) at src/stream.c:2513
> #4 0x004bc38e in process_runnable_tasks () at src/task.c:229
> #5 0x00408d9c in run_poll_loop () at src/haproxy.c:2399
> #6 run_thread_poll_loop (data=) at src/haproxy.c:2461
> #7 main (argc=, argv=0x7ffe6e2cf2d8) at src/haproxy.c:3065
> (gdb) quit
> 
> 
[...]
> Please let me know what's the root cause this option works fine with 1.7.x 
> version.
> 

It's related to changes we made in the architecture in 1.8.
The attached patch should fix it. It was made for master, but should apply to
1.8 as well.

Thanks for reporting !

Olivier
>From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Wed, 11 Apr 2018 17:23:17 +0200
Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode.

We were allocating a new connection when in proxy mode, but did not provide
it a mux, thus crashing later.

This should be backported to 1.8.
---
 src/proto_http.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/proto_http.c b/src/proto_http.c
index 80e001d69..817692c48 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct channel 
*req, int an_bit)
 
return 0;
}
+   /* XXX: We probably need a better mux */
+   conn_install_mux(conn, _pt_ops, objt_cs(s->si[1].end));
 
path = http_get_path(txn);
url2sa(req->buf->p + msg->sl.rq.u,
-- 
2.14.3