Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-19 Thread Willy Tarreau
Hi Christopher,

On Mon, Feb 19, 2018 at 03:24:09PM +0100, Christopher Faulet wrote:
> Someone on discourse reports a problem with this patch:
> 
> https://discourse.haproxy.org/t/random-sa-errors-with-haproxy-1-8-3/2116/6
> 
> I asked him to test the attached patch. But It could be cool to have more
> feedback on the fix. The bug can easily be reproduced by closing a
> connection opened with openssl s_client with a control-c.

Hehe so it seems I was not completely dumb when saying that the error
used to compete with read0 in previous version ;-)

It it's not enough, we may decide to fall back to the old method that
was documented till OpenSSL 1.0.2 consisting in checking whether ret=0
was passed to SSL_get_error() or not to decide whether it's an error
or a shutdown. But I think that in the current situation these ones will
be equivalent for our use case anyway.

Now applied, thanks!
Willy



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-19 Thread Christopher Faulet

Le 14/02/2018 à 18:53, Willy Tarreau a écrit :

On Wed, Feb 14, 2018 at 06:20:42PM +0100, Mateusz Malek wrote:

Hi,

On 14.02.2018 17:53, Willy Tarreau wrote:

On Wed, Feb 14, 2018 at 05:29:57PM +0100, Olivier Houchard wrote:

What about what's attached, instead ?

I think it should work. Mateusz, care to give it a try to confirm ?
If OK, I'll merge it.


I confirm, with this patch applied problem is gone.


Excellent, thanks for the quick test. Patch now applied.



Hi,

Someone on discourse reports a problem with this patch:


https://discourse.haproxy.org/t/random-sa-errors-with-haproxy-1-8-3/2116/6

I asked him to test the attached patch. But It could be cool to have 
more feedback on the fix. The bug can easily be reproduced by closing a 
connection opened with openssl s_client with a control-c.


Thanks
--
Christopher Faulet
>From e0ffb9cd3210ccc3eec96350b0405963746960d0 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Mon, 19 Feb 2018 14:25:15 +0100
Subject: [PATCH] BUG/MEDIUM: ssl: Shutdown the connection for reading on
 SSL_ERROR_SYSCALL

When SSL_read returns SSL_ERROR_SYSCALL and errno is unset or set to EAGAIN, the
connection must be shut down for reading. Else, the connection loops infinitly,
consuming all the CPU.

The bug was introduced in the commit 7e2e50500 ("BUG/MEDIUM: ssl: Don't always
treat SSL_ERROR_SYSCALL as unrecovarable."). This patch must be backported in
1.8 too.
---
 src/ssl_sock.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 93aec33c4..1cd1547c7 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -5449,10 +5449,9 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun
 break;
 			} else if (ret == SSL_ERROR_ZERO_RETURN)
 goto read0;
-			/* For SSL_ERROR_SYSCALL, make sure the error is
-			 * unrecoverable before flagging the connection as
-			 * in error.
-			 */
+			/* For SSL_ERROR_SYSCALL, make sure to clear the error
+			 * stack before shutting down the connection for
+			 * reading. */
 			if (ret == SSL_ERROR_SYSCALL && (!errno || errno == EAGAIN))
 goto clear_ssl_error;
 			/* otherwise it's a real error */
@@ -5465,16 +5464,19 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun
 	conn_cond_update_sock_polling(conn);
 	return done;
 
+ clear_ssl_error:
+	/* Clear openssl global errors stack */
+	ssl_sock_dump_errors(conn);
+	ERR_clear_error();
  read0:
 	conn_sock_read0(conn);
 	goto leave;
+
  out_error:
 	conn->flags |= CO_FL_ERROR;
-clear_ssl_error:
 	/* Clear openssl global errors stack */
 	ssl_sock_dump_errors(conn);
 	ERR_clear_error();
-
 	goto leave;
 }
 
-- 
2.14.3



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-14 Thread Mateusz Małek

Hi,

On 14.02.2018 17:53, Willy Tarreau wrote:

On Wed, Feb 14, 2018 at 05:29:57PM +0100, Olivier Houchard wrote:

What about what's attached, instead ?

I think it should work. Mateusz, care to give it a try to confirm ?
If OK, I'll merge it.


I confirm, with this patch applied problem is gone.

Best regards
Mateusz Małek



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-14 Thread Willy Tarreau
On Wed, Feb 14, 2018 at 05:29:57PM +0100, Olivier Houchard wrote:
> Hi Willy,
> 
> On Tue, Feb 13, 2018 at 08:05:44PM +0100, Willy Tarreau wrote:
> > Hi Olivier,
> > Such type of construct tends to scare me (probably because I'm not reading
> > the whole code). It means we're supposed to set an error by default unless
> > we pass by a specific path. I fear that we'll get future issues (possibly
> > as the result of further fixes for unrelated stuff).
> > 
> > Also I'm really not fond of the fact that the out_error label isn't used
> > anymore to set the error, but sometimes to set it, sometimes to clear it.
> > 
> 
> What about what's attached, instead ?

I think it should work. Mateusz, care to give it a try to confirm ?
If OK, I'll merge it.

> > It's just a suggestion, it can be done by storing the result of
> > SSL_get_error() anywhere else, but definitely we need to make a clear
> > distinction between all these cases and for now the distinction between
> > the read0, completed connection and other errors is lost.
> > 
> 
> Because we can't rely on ret == 0 being meaningful. The manpage for
> OpenSSL 1.1.1 states :
> Old documentation indicated a difference between 0 and -1, and that -1 was
> retryable. You should instead call SSL_get_error() to find out if it's
> retryable.

Interesting, indeed the doc and man page has changed between versions.
My man page explicitly mentions the fact that you need to use
SSL_get_error() to detect SSL_ERROR_SYSCALL, and then act depending
on what  value was passed to it. Let's hope they only changed the
doc and not the semantics...

> And the switch would lead to more goto, as we need to break from outside the
> loop :)

Possibly.

> > I'm also seeing that the whole block could be rearranged so that ret <= 0
> > is handled first. That would remove some gotos and would leave the main
> > part after all error handling.
> > 
> 
> I'm not sure I get that part. I don't mind one way or another, but I don't
> understand how it would remove gotos.

Note that I have not made the exercise.

Willy



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-14 Thread Olivier Houchard
Hi Willy,

On Tue, Feb 13, 2018 at 08:05:44PM +0100, Willy Tarreau wrote:
> Hi Olivier,
> Such type of construct tends to scare me (probably because I'm not reading
> the whole code). It means we're supposed to set an error by default unless
> we pass by a specific path. I fear that we'll get future issues (possibly
> as the result of further fixes for unrelated stuff).
> 
> Also I'm really not fond of the fact that the out_error label isn't used
> anymore to set the error, but sometimes to set it, sometimes to clear it.
> 

What about what's attached, instead ?

> Why instead not handle it almost similarly to the way it was before the
> patch that removed the code ? I'm seeing that in the past we used to only
> differenciate between read0 and error. But if we're failing on SYSCALL
> with ret previously set to zero, an EOF was observed, so that's akin to
> a read0 (this information is lost in the recent code since ret is
> overwritten). May I suggest that we change all this block to a switch/case
> instead ?  It would more or less give something like this :
> 
>   if (ret > 0) {
>   blah;
>   }
>   else switch (SSL_get_error(ctx, ret)) {
>   case SSL_ERROR_WANT_WRITE: blah; break;
>   case SSL_ERROR_WANT_READ: blah; break;
>   case SSL_ERROR_ZERO_RETURN: goto read0;
>   case SSL_ERROR_SYSCALL:
>   if (!ret)
>   goto read0;
>   else if (errno && errno != EAGAIN)
>   goto out_error;
>   else /* valid cases, connection establishment, etc */
>   goto leave;
>   default:
>   goto out_error;
>   }
> 
> It's just a suggestion, it can be done by storing the result of
> SSL_get_error() anywhere else, but definitely we need to make a clear
> distinction between all these cases and for now the distinction between
> the read0, completed connection and other errors is lost.
> 

Because we can't rely on ret == 0 being meaningful. The manpage for
OpenSSL 1.1.1 states :
Old documentation indicated a difference between 0 and -1, and that -1 was 
retryable. You should instead call SSL_get_error() to find out if it's 
retryable.
And the switch would lead to more goto, as we need to break from outside the
loop :)

> I'm also seeing that the whole block could be rearranged so that ret <= 0
> is handled first. That would remove some gotos and would leave the main
> part after all error handling.
> 

I'm not sure I get that part. I don't mind one way or another, but I don't
understand how it would remove gotos.

> BTW this makes me realize that your inverted condition above seems wrong
> (|| instead of &&).
> 

Oops, that is true, those things are too complicated.

Regards,

Olivier
>From fca75937f4f2b1927f5c82c137cba292c82dac53 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 13 Feb 2018 15:17:23 +0100
Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as
 unrecovarable.

SSL_read() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL,
without meaning the connection is gone. Before flagging the conection
as in error, check the errno value.

This should be backported to 1.8.
---
 src/ssl_sock.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index aee3cd965..b24db079d 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -5434,6 +5434,13 @@ static int ssl_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
break;
} else if (ret == SSL_ERROR_ZERO_RETURN)
goto read0;
+   /* For SSL_ERROR_SYSCALL, make sure the error is
+* unrecoverable before flagging the connection as
+* in error.
+*/
+   if (ret == SSL_ERROR_SYSCALL && (!errno || errno ==
+   EAGAIN))
+   goto clear_ssl_error;
/* otherwise it's a real error */
goto out_error;
}
@@ -5448,11 +5455,12 @@ static int ssl_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
conn_sock_read0(conn);
goto leave;
  out_error:
+   conn->flags |= CO_FL_ERROR;
+clear_ssl_error:
/* Clear openssl global errors stack */
ssl_sock_dump_errors(conn);
ERR_clear_error();
 
-   conn->flags |= CO_FL_ERROR;
goto leave;
 }
 
-- 
2.14.3



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-13 Thread Willy Tarreau
Hi Olivier,

On Tue, Feb 13, 2018 at 06:07:36PM +0100, Olivier Houchard wrote:
> Hi Emmanuel,
> 
> On Tue, Feb 13, 2018 at 05:40:00PM +0100, Emmanuel Hocdet wrote:
> > Hi Olivier
> > 
> > > Le 13 févr. 2018 à 15:27, Olivier Houchard  a 
> > > écrit :
> > > 
> > > Thanks a lot for the detailed analyze, and sorry for the late answer.
> > > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an
> > > unrecoverable error.
> > > So, what you basically did was something equivalent to the patch attached 
> > > ?
> > > 
> > > Thanks a lot !
> > > 
> > > Olivier
> > > <0001-MINOR-BUG-ssl-Don-t-always-treat-SSL_ERROR_SYSCALL-a.patch>
> > 
> > 
> > 'ret' can be unassigned
> > errno must be check earlier.
> 
> I (mostly) disagree.
> There are 3 goto out_error, and on the only one that mattered, ret and
> errno should be fine.
> However, I rewrote the patch to make it clearer what the intent is.
> Do you like it better ?
> 
> Cheers,
> 
> Olivier

> >From 8de2963753a33ca948beebcdd70c5f46bd01e4cf Mon Sep 17 00:00:00 2001
> From: Olivier Houchard 
> Date: Tue, 13 Feb 2018 15:17:23 +0100
> Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as
>  unrecovarable.
> 
> SSL_Raad() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL,
> without meaning the connection is gone. Before flagging the conection
> as in error, check the errno value.
> 
> This should be backported to 1.8.
> ---
>  src/ssl_sock.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/src/ssl_sock.c b/src/ssl_sock.c
> index aee3cd965..83e9b86e4 100644
> --- a/src/ssl_sock.c
> +++ b/src/ssl_sock.c
> @@ -5313,6 +5313,7 @@ reneg_ok:
>  static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int 
> count)
>  {
>   int ret, done = 0;
> + int set_error_flag = 1;
>   int try;
>  
>   conn_refresh_polling_flags(conn);
> @@ -5434,6 +5435,13 @@ static int ssl_sock_to_buf(struct connection *conn, 
> struct buffer *buf, int coun
>   break;
>   } else if (ret == SSL_ERROR_ZERO_RETURN)
>   goto read0;
> + /* For SSL_ERROR_SYSCALL, make sure the error is
> +  * unrecoverable before flagging the connection as
> +  * in error.
> +  */
> + if (ret == SSL_ERROR_SYSCALL || (!errno || errno ==
> + EAGAIN))
> + set_error_flag = 0;
>   /* otherwise it's a real error */
>   goto out_error;
>   }
> @@ -5452,7 +5460,8 @@ static int ssl_sock_to_buf(struct connection *conn, 
> struct buffer *buf, int coun
>   ssl_sock_dump_errors(conn);
>   ERR_clear_error();
>  
> - conn->flags |= CO_FL_ERROR;
> + if (set_error_flag)
> + conn->flags |= CO_FL_ERROR;
>   goto leave;
>  }

Such type of construct tends to scare me (probably because I'm not reading
the whole code). It means we're supposed to set an error by default unless
we pass by a specific path. I fear that we'll get future issues (possibly
as the result of further fixes for unrelated stuff).

Also I'm really not fond of the fact that the out_error label isn't used
anymore to set the error, but sometimes to set it, sometimes to clear it.

Why instead not handle it almost similarly to the way it was before the
patch that removed the code ? I'm seeing that in the past we used to only
differenciate between read0 and error. But if we're failing on SYSCALL
with ret previously set to zero, an EOF was observed, so that's akin to
a read0 (this information is lost in the recent code since ret is
overwritten). May I suggest that we change all this block to a switch/case
instead ?  It would more or less give something like this :

if (ret > 0) {
blah;
}
else switch (SSL_get_error(ctx, ret)) {
case SSL_ERROR_WANT_WRITE: blah; break;
case SSL_ERROR_WANT_READ: blah; break;
case SSL_ERROR_ZERO_RETURN: goto read0;
case SSL_ERROR_SYSCALL:
if (!ret)
goto read0;
else if (errno && errno != EAGAIN)
goto out_error;
else /* valid cases, connection establishment, etc */
goto leave;
default:
goto out_error;
}

It's just a suggestion, it can be done by storing the result of
SSL_get_error() anywhere else, but definitely we need to make a clear
distinction between all these cases and for now the distinction between
the read0, completed connection and other errors is lost.

I'm also seeing that the whole block could be rearranged so that ret <= 0
is handled first. That would remove some 

Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-13 Thread Olivier Houchard
Hi Emmanuel,

On Tue, Feb 13, 2018 at 05:40:00PM +0100, Emmanuel Hocdet wrote:
> Hi Olivier
> 
> > Le 13 févr. 2018 à 15:27, Olivier Houchard  a écrit :
> > 
> > Thanks a lot for the detailed analyze, and sorry for the late answer.
> > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an
> > unrecoverable error.
> > So, what you basically did was something equivalent to the patch attached ?
> > 
> > Thanks a lot !
> > 
> > Olivier
> > <0001-MINOR-BUG-ssl-Don-t-always-treat-SSL_ERROR_SYSCALL-a.patch>
> 
> 
> 'ret' can be unassigned
> errno must be check earlier.

I (mostly) disagree.
There are 3 goto out_error, and on the only one that mattered, ret and
errno should be fine.
However, I rewrote the patch to make it clearer what the intent is.
Do you like it better ?

Cheers,

Olivier
>From 8de2963753a33ca948beebcdd70c5f46bd01e4cf Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 13 Feb 2018 15:17:23 +0100
Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as
 unrecovarable.

SSL_Raad() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL,
without meaning the connection is gone. Before flagging the conection
as in error, check the errno value.

This should be backported to 1.8.
---
 src/ssl_sock.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index aee3cd965..83e9b86e4 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -5313,6 +5313,7 @@ reneg_ok:
 static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int 
count)
 {
int ret, done = 0;
+   int set_error_flag = 1;
int try;
 
conn_refresh_polling_flags(conn);
@@ -5434,6 +5435,13 @@ static int ssl_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
break;
} else if (ret == SSL_ERROR_ZERO_RETURN)
goto read0;
+   /* For SSL_ERROR_SYSCALL, make sure the error is
+* unrecoverable before flagging the connection as
+* in error.
+*/
+   if (ret == SSL_ERROR_SYSCALL || (!errno || errno ==
+   EAGAIN))
+   set_error_flag = 0;
/* otherwise it's a real error */
goto out_error;
}
@@ -5452,7 +5460,8 @@ static int ssl_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
ssl_sock_dump_errors(conn);
ERR_clear_error();
 
-   conn->flags |= CO_FL_ERROR;
+   if (set_error_flag)
+   conn->flags |= CO_FL_ERROR;
goto leave;
 }
 
-- 
2.14.3



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-13 Thread Emmanuel Hocdet
Hi Olivier

> Le 13 févr. 2018 à 15:27, Olivier Houchard  a écrit :
> 
> Thanks a lot for the detailed analyze, and sorry for the late answer.
> You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an
> unrecoverable error.
> So, what you basically did was something equivalent to the patch attached ?
> 
> Thanks a lot !
> 
> Olivier
> <0001-MINOR-BUG-ssl-Don-t-always-treat-SSL_ERROR_SYSCALL-a.patch>


'ret' can be unassigned
errno must be check earlier.

++
Manu



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-13 Thread Willy Tarreau
On Tue, Feb 13, 2018 at 05:29:21PM +0100, Mateusz Malek wrote:
> Hi,
> 
> On 13.02.2018 15:27, Olivier Houchard wrote:
> > Thanks a lot for the detailed analyze, and sorry for the late answer.
> > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an
> > unrecoverable error.
> > So, what you basically did was something equivalent to the patch attached ?
> 
> Yeah, it looked exactly like this ;) I'm glad I could help.

So does this mean I should take the patch as is ? Please just let me know.

Thanks,
Willy



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-13 Thread Mateusz Małek
Hi,

On 13.02.2018 15:27, Olivier Houchard wrote:
> Thanks a lot for the detailed analyze, and sorry for the late answer.
> You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an
> unrecoverable error.
> So, what you basically did was something equivalent to the patch attached ?

Yeah, it looked exactly like this ;) I'm glad I could help.

Best regards
Mateusz Małek



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-13 Thread Olivier Houchard
Hi guys,

On Sat, Feb 10, 2018 at 06:26:42PM +0100, Mateusz Małek wrote:
> Hi everyone,
> 
> I've narrowed down my problem down to the same commit as Tomek Gacek -
> c2aae74f010f97a3415542fe649198a5d3be1ea8 (MEDIUM: ssl: Handle early data
> with OpenSSL 1.1.1), so I guess it may be related. In my case, since upgrade
> to 1.8, some responses from some backends (not sure what exactly triggers
> the bug) do not have their headers modified (despite http-response
> add-header and http-response del-header being set).
> 
> Applying patch part-by-part, I got to a point where it seems that that was
> caused by changes to ssl_sock_to_buf function in src/ssl_sock.c (lines
> 396-431):
> https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L396-L431
> 
> Code at out_error label behave a bit differently from part removed in this
> commit - namely, it sets conn->flags |= CO_FL_ERROR unconditionally, while
> previously there was an additional check (skipping error flag setting if
> errno was set to EAGAIN). My problems went straight away when I've changed
> out_error to match old code.
> 

Thanks a lot for the detailed analyze, and sorry for the late answer.
You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an
unrecoverable error.
So, what you basically did was something equivalent to the patch attached ?

Thanks a lot !

Olivier
>From b423f94273be2c7040ce0861bd4a21617b4c5c2b Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 13 Feb 2018 15:17:23 +0100
Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as
 unrecovarable.

SSL_Raad() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL,
without meaning the connection is gone. Before flagging the conection
as in error, check the errno value.

This should be backported to 1.8.
---
 src/ssl_sock.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index aee3cd965..687133b0d 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -5452,7 +5452,9 @@ static int ssl_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
ssl_sock_dump_errors(conn);
ERR_clear_error();
 
-   conn->flags |= CO_FL_ERROR;
+   if ((ret != SSL_ERROR_SYSCALL) ||
+   (errno && errno != EAGAIN))
+   conn->flags |= CO_FL_ERROR;
goto leave;
 }
 
-- 
2.14.3



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-10 Thread Willy Tarreau
Hi Mateusz,

I'm CCing Emeric (SSL maintainer) and Olivier (who added early-data support),
and responding to some points below.

On Sat, Feb 10, 2018 at 06:26:42PM +0100, Mateusz Malek wrote:
> Hi everyone,
> 
> I've narrowed down my problem down to the same commit as Tomek Gacek -
> c2aae74f010f97a3415542fe649198a5d3be1ea8 (MEDIUM: ssl: Handle early data
> with OpenSSL 1.1.1), so I guess it may be related. In my case, since upgrade
> to 1.8, some responses from some backends (not sure what exactly triggers
> the bug) do not have their headers modified (despite http-response
> add-header and http-response del-header being set).
> 
> Applying patch part-by-part, I got to a point where it seems that that was
> caused by changes to ssl_sock_to_buf function in src/ssl_sock.c (lines
> 396-431):
> https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L396-L431
> 
> Code at out_error label behave a bit differently from part removed in this
> commit - namely, it sets conn->flags |= CO_FL_ERROR unconditionally, while
> previously there was an additional check (skipping error flag setting if
> errno was set to EAGAIN). My problems went straight away when I've changed
> out_error to match old code.

Interesting, it would be possible that sometimes an SSL error remains on
the stack, and that it randomly gets trapped here. It used to happen in
the past and we went through a great pain trying to always clean all of
them.

I have no idea why this specific condition was removed. It's not mentionned
at all in Olivier's commit message and suspiciously looks like an accident.
Olivier, do you have an idea on this one ? It's commit c2aae74f. It may be
related to an issue Emeric was chasing recently where some server-side
connections would occasionally fail if my memory serves me right.

> There is also another issue with this commit - it seems that one "1" got
> lost in OPENSSL_VERSION_NUMBER comparison (line 267):
> https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L267
>
> Throughout this commit all additions of similar ifdefs use 0x10101000L,
> which translates to OpenSSL 1.1.1 - and this one oddly translates to version
> 0.1.1.

Yes but this one was fixed in 1.8-rc3 by commit cfdef2e3 so it cannot be
this one. But the removed part above definitely is an interesting one.

> Hope this helps!

Sure it does!

Thanks!
Willy



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-10 Thread Mateusz Małek

Hi everyone,

I've narrowed down my problem down to the same commit as Tomek Gacek - 
c2aae74f010f97a3415542fe649198a5d3be1ea8 (MEDIUM: ssl: Handle early data 
with OpenSSL 1.1.1), so I guess it may be related. In my case, since 
upgrade to 1.8, some responses from some backends (not sure what exactly 
triggers the bug) do not have their headers modified (despite 
http-response add-header and http-response del-header being set).


Applying patch part-by-part, I got to a point where it seems that that 
was caused by changes to ssl_sock_to_buf function in src/ssl_sock.c 
(lines 396-431):

https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L396-L431

Code at out_error label behave a bit differently from part removed in 
this commit - namely, it sets conn->flags |= CO_FL_ERROR 
unconditionally, while previously there was an additional check 
(skipping error flag setting if errno was set to EAGAIN). My problems 
went straight away when I've changed out_error to match old code.



There is also another issue with this commit - it seems that one "1" got 
lost in OPENSSL_VERSION_NUMBER comparison (line 267):

https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L267

Throughout this commit all additions of similar ifdefs use 0x10101000L, 
which translates to OpenSSL 1.1.1 - and this one oddly translates to 
version 0.1.1.



Hope this helps!

Best regards
Mateusz Malek



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-08 Thread Tomek Gacek

Hi Willy

On 2018-02-03 10:05, Willy Tarreau wrote:

Hi Tomek,

On Sat, Feb 03, 2018 at 08:47:35AM +0100, Tomek Gacek wrote:

I have same issue. It's pretty random as I would say about 60-70% requests
are OK, but rest is failing. I compiled all 1.8 versions and was able to
isolate this a little bit. It's fine up to 1.8.0-dev3 branch and it's
failing since 1.8.0-rc1.
The problem is for SSL connections and after digging in my apps logs it
looks like it's related to reusing keep alived connections between client
and haproxy.
By default I have in config "option http-server-close" present and when it's
there I can see the problem. When this option is removed - problem is
solved.

This is extremely valuable information. I suspect we might have damaged
something in the backend code when inserting the mux layer. Given that
the connection and stream are a bit more independant from each other now
due to the mux, it might be possible that depending the sequence of some
events (eg: connection close), it changes how the close event is
interpreted in the stream. This will definitely help us narrow down the
cause of the issue.



My tests show the problem is caused by this commit
http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=c2aae74f010f97a3415542fe649198a5d3be1ea8
This is first snapshot were I was able to recreate problem.

Unable to recreate in previous commit - 
http://git.haproxy.org/?p=haproxy-1.8.git;a=commit;h=253c62b257c137e7da5c273f42bc5d6eacd31d2c



Regards,
Tomek



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-03 Thread Willy Tarreau
Hi Tomek,

On Sat, Feb 03, 2018 at 08:47:35AM +0100, Tomek Gacek wrote:
> I have same issue. It's pretty random as I would say about 60-70% requests
> are OK, but rest is failing. I compiled all 1.8 versions and was able to
> isolate this a little bit. It's fine up to 1.8.0-dev3 branch and it's
> failing since 1.8.0-rc1.
> The problem is for SSL connections and after digging in my apps logs it
> looks like it's related to reusing keep alived connections between client
> and haproxy.
> By default I have in config "option http-server-close" present and when it's
> there I can see the problem. When this option is removed - problem is
> solved.

This is extremely valuable information. I suspect we might have damaged
something in the backend code when inserting the mux layer. Given that
the connection and stream are a bit more independant from each other now
due to the mux, it might be possible that depending the sequence of some
events (eg: connection close), it changes how the close event is
interpreted in the stream. This will definitely help us narrow down the
cause of the issue.

Thanks!
Willy



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-02-02 Thread Tomek Gacek

Hi

On 2018-01-17 11:37, Bart Geesink wrote:

Hi,

On 01/17/2018 10:16 AM, Christopher Faulet wrote:

Le 16/01/2018 à 16:18, Lukas Tribus a écrit :

Hello Christopher,


On 16 January 2018 at 15:01, Bart Geesink 
wrote:

Hi,

We have an issue in haproxy > 1.8 on CentOS when using SSL in the server
configuration. Haproxy sometimes logs a http status code "-1" followed
by the termination_state SDxx. This happens every few requests. When
using one backend, the clients don't notice it. When using multiple
backends, this can result in redirecting traffic to the wrong backend
(the proxy inserts a cookie to track which backend is used).

Removing the SSL configuration and using plain http solves the issue, as
does downgrading to version 1.7.

Also see:
https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10



Hi Lukas,

Thanks, I will check these 2 issues. First of all, I need to reproduce
them to be sure. This one seems to be a bit different because the status
code is "-1".


It also different since it occurs in both 1.8.1 and 1.8.3


Bart, when you said your backend does not log any problems, it means
that for a request logged with SD termination state on haproxy, you have
a 200 on Apache side ? And what does the response look like from the
client side ? (truncated / good / error ...)


Apache logs a 200. The reponse looks fine from the client side. The only
thing that seems to be missing is the cookie inserted by the proxy for
some requests. Other requests seem not to experience this behaviour.
There is no real pattern: I can happen when a valid proxy cookie is
present, but also when a new browser session is started and a proxy
cookie is not yet present. Downgrading to 1.7 helps, as does using a
http backend without ssl.

Regards,

Bart

I have same issue. It's pretty random as I would say about 60-70% 
requests are OK, but rest is failing. I compiled all 1.8 versions and 
was able to isolate this a little bit. It's fine up to 1.8.0-dev3 branch 
and it's failing since 1.8.0-rc1.
The problem is for SSL connections and after digging in my apps logs it 
looks like it's related to reusing keep alived connections between 
client and haproxy.
By default I have in config "option http-server-close" present and when 
it's there I can see the problem. When this option is removed - problem 
is solved.


Regards,
Tomek



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-01-17 Thread Bart Geesink
Hi,

On 01/17/2018 10:16 AM, Christopher Faulet wrote:
> Le 16/01/2018 à 16:18, Lukas Tribus a écrit :
>> Hello Christopher,
>>
>>
>> On 16 January 2018 at 15:01, Bart Geesink 
>> wrote:
>>> Hi,
>>>
>>> We have an issue in haproxy > 1.8 on CentOS when using SSL in the server
>>> configuration. Haproxy sometimes logs a http status code "-1" followed
>>> by the termination_state SDxx. This happens every few requests. When
>>> using one backend, the clients don't notice it. When using multiple
>>> backends, this can result in redirecting traffic to the wrong backend
>>> (the proxy inserts a cookie to track which backend is used).
>>>
>>> Removing the SSL configuration and using plain http solves the issue, as
>>> does downgrading to version 1.7.
>>
>> Also see:
>> https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10
>>
>>
> 
> Hi Lukas,
> 
> Thanks, I will check these 2 issues. First of all, I need to reproduce
> them to be sure. This one seems to be a bit different because the status
> code is "-1".
> 
It also different since it occurs in both 1.8.1 and 1.8.3

> Bart, when you said your backend does not log any problems, it means
> that for a request logged with SD termination state on haproxy, you have
> a 200 on Apache side ? And what does the response look like from the
> client side ? (truncated / good / error ...)
> 
Apache logs a 200. The reponse looks fine from the client side. The only
thing that seems to be missing is the cookie inserted by the proxy for
some requests. Other requests seem not to experience this behaviour.
There is no real pattern: I can happen when a valid proxy cookie is
present, but also when a new browser session is started and a proxy
cookie is not yet present. Downgrading to 1.7 helps, as does using a
http backend without ssl.

Regards,

Bart



signature.asc
Description: OpenPGP digital signature


Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-01-17 Thread Christopher Faulet

Le 16/01/2018 à 16:18, Lukas Tribus a écrit :

Hello Christopher,


On 16 January 2018 at 15:01, Bart Geesink  wrote:

Hi,

We have an issue in haproxy > 1.8 on CentOS when using SSL in the server
configuration. Haproxy sometimes logs a http status code "-1" followed
by the termination_state SDxx. This happens every few requests. When
using one backend, the clients don't notice it. When using multiple
backends, this can result in redirecting traffic to the wrong backend
(the proxy inserts a cookie to track which backend is used).

Removing the SSL configuration and using plain http solves the issue, as
does downgrading to version 1.7.


Also see:
https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10



Hi Lukas,

Thanks, I will check these 2 issues. First of all, I need to reproduce 
them to be sure. This one seems to be a bit different because the status 
code is "-1".


Bart, when you said your backend does not log any problems, it means 
that for a request logged with SD termination state on haproxy, you have 
a 200 on Apache side ? And what does the response look like from the 
client side ? (truncated / good / error ...)


--
Christopher Faulet



Re: haproxy 1.8 ssl backend server leads to server session aborts

2018-01-16 Thread Lukas Tribus
Hello Christopher,


On 16 January 2018 at 15:01, Bart Geesink  wrote:
> Hi,
>
> We have an issue in haproxy > 1.8 on CentOS when using SSL in the server
> configuration. Haproxy sometimes logs a http status code "-1" followed
> by the termination_state SDxx. This happens every few requests. When
> using one backend, the clients don't notice it. When using multiple
> backends, this can result in redirecting traffic to the wrong backend
> (the proxy inserts a cookie to track which backend is used).
>
> Removing the SSL configuration and using plain http solves the issue, as
> does downgrading to version 1.7.

Also see:
https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10



cheers,
lukas