Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Christopher, On Mon, Feb 19, 2018 at 03:24:09PM +0100, Christopher Faulet wrote: > Someone on discourse reports a problem with this patch: > > https://discourse.haproxy.org/t/random-sa-errors-with-haproxy-1-8-3/2116/6 > > I asked him to test the attached patch. But It could be cool to have more > feedback on the fix. The bug can easily be reproduced by closing a > connection opened with openssl s_client with a control-c. Hehe so it seems I was not completely dumb when saying that the error used to compete with read0 in previous version ;-) It it's not enough, we may decide to fall back to the old method that was documented till OpenSSL 1.0.2 consisting in checking whether ret=0 was passed to SSL_get_error() or not to decide whether it's an error or a shutdown. But I think that in the current situation these ones will be equivalent for our use case anyway. Now applied, thanks! Willy
Re: haproxy 1.8 ssl backend server leads to server session aborts
Le 14/02/2018 à 18:53, Willy Tarreau a écrit : On Wed, Feb 14, 2018 at 06:20:42PM +0100, Mateusz Malek wrote: Hi, On 14.02.2018 17:53, Willy Tarreau wrote: On Wed, Feb 14, 2018 at 05:29:57PM +0100, Olivier Houchard wrote: What about what's attached, instead ? I think it should work. Mateusz, care to give it a try to confirm ? If OK, I'll merge it. I confirm, with this patch applied problem is gone. Excellent, thanks for the quick test. Patch now applied. Hi, Someone on discourse reports a problem with this patch: https://discourse.haproxy.org/t/random-sa-errors-with-haproxy-1-8-3/2116/6 I asked him to test the attached patch. But It could be cool to have more feedback on the fix. The bug can easily be reproduced by closing a connection opened with openssl s_client with a control-c. Thanks -- Christopher Faulet >From e0ffb9cd3210ccc3eec96350b0405963746960d0 Mon Sep 17 00:00:00 2001 From: Christopher FauletDate: Mon, 19 Feb 2018 14:25:15 +0100 Subject: [PATCH] BUG/MEDIUM: ssl: Shutdown the connection for reading on SSL_ERROR_SYSCALL When SSL_read returns SSL_ERROR_SYSCALL and errno is unset or set to EAGAIN, the connection must be shut down for reading. Else, the connection loops infinitly, consuming all the CPU. The bug was introduced in the commit 7e2e50500 ("BUG/MEDIUM: ssl: Don't always treat SSL_ERROR_SYSCALL as unrecovarable."). This patch must be backported in 1.8 too. --- src/ssl_sock.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/ssl_sock.c b/src/ssl_sock.c index 93aec33c4..1cd1547c7 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -5449,10 +5449,9 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun break; } else if (ret == SSL_ERROR_ZERO_RETURN) goto read0; - /* For SSL_ERROR_SYSCALL, make sure the error is - * unrecoverable before flagging the connection as - * in error. - */ + /* For SSL_ERROR_SYSCALL, make sure to clear the error + * stack before shutting down the connection for + * reading. */ if (ret == SSL_ERROR_SYSCALL && (!errno || errno == EAGAIN)) goto clear_ssl_error; /* otherwise it's a real error */ @@ -5465,16 +5464,19 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun conn_cond_update_sock_polling(conn); return done; + clear_ssl_error: + /* Clear openssl global errors stack */ + ssl_sock_dump_errors(conn); + ERR_clear_error(); read0: conn_sock_read0(conn); goto leave; + out_error: conn->flags |= CO_FL_ERROR; -clear_ssl_error: /* Clear openssl global errors stack */ ssl_sock_dump_errors(conn); ERR_clear_error(); - goto leave; } -- 2.14.3
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi, On 14.02.2018 17:53, Willy Tarreau wrote: On Wed, Feb 14, 2018 at 05:29:57PM +0100, Olivier Houchard wrote: What about what's attached, instead ? I think it should work. Mateusz, care to give it a try to confirm ? If OK, I'll merge it. I confirm, with this patch applied problem is gone. Best regards Mateusz Małek
Re: haproxy 1.8 ssl backend server leads to server session aborts
On Wed, Feb 14, 2018 at 05:29:57PM +0100, Olivier Houchard wrote: > Hi Willy, > > On Tue, Feb 13, 2018 at 08:05:44PM +0100, Willy Tarreau wrote: > > Hi Olivier, > > Such type of construct tends to scare me (probably because I'm not reading > > the whole code). It means we're supposed to set an error by default unless > > we pass by a specific path. I fear that we'll get future issues (possibly > > as the result of further fixes for unrelated stuff). > > > > Also I'm really not fond of the fact that the out_error label isn't used > > anymore to set the error, but sometimes to set it, sometimes to clear it. > > > > What about what's attached, instead ? I think it should work. Mateusz, care to give it a try to confirm ? If OK, I'll merge it. > > It's just a suggestion, it can be done by storing the result of > > SSL_get_error() anywhere else, but definitely we need to make a clear > > distinction between all these cases and for now the distinction between > > the read0, completed connection and other errors is lost. > > > > Because we can't rely on ret == 0 being meaningful. The manpage for > OpenSSL 1.1.1 states : > Old documentation indicated a difference between 0 and -1, and that -1 was > retryable. You should instead call SSL_get_error() to find out if it's > retryable. Interesting, indeed the doc and man page has changed between versions. My man page explicitly mentions the fact that you need to use SSL_get_error() to detect SSL_ERROR_SYSCALL, and then act depending on what value was passed to it. Let's hope they only changed the doc and not the semantics... > And the switch would lead to more goto, as we need to break from outside the > loop :) Possibly. > > I'm also seeing that the whole block could be rearranged so that ret <= 0 > > is handled first. That would remove some gotos and would leave the main > > part after all error handling. > > > > I'm not sure I get that part. I don't mind one way or another, but I don't > understand how it would remove gotos. Note that I have not made the exercise. Willy
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Willy, On Tue, Feb 13, 2018 at 08:05:44PM +0100, Willy Tarreau wrote: > Hi Olivier, > Such type of construct tends to scare me (probably because I'm not reading > the whole code). It means we're supposed to set an error by default unless > we pass by a specific path. I fear that we'll get future issues (possibly > as the result of further fixes for unrelated stuff). > > Also I'm really not fond of the fact that the out_error label isn't used > anymore to set the error, but sometimes to set it, sometimes to clear it. > What about what's attached, instead ? > Why instead not handle it almost similarly to the way it was before the > patch that removed the code ? I'm seeing that in the past we used to only > differenciate between read0 and error. But if we're failing on SYSCALL > with ret previously set to zero, an EOF was observed, so that's akin to > a read0 (this information is lost in the recent code since ret is > overwritten). May I suggest that we change all this block to a switch/case > instead ? It would more or less give something like this : > > if (ret > 0) { > blah; > } > else switch (SSL_get_error(ctx, ret)) { > case SSL_ERROR_WANT_WRITE: blah; break; > case SSL_ERROR_WANT_READ: blah; break; > case SSL_ERROR_ZERO_RETURN: goto read0; > case SSL_ERROR_SYSCALL: > if (!ret) > goto read0; > else if (errno && errno != EAGAIN) > goto out_error; > else /* valid cases, connection establishment, etc */ > goto leave; > default: > goto out_error; > } > > It's just a suggestion, it can be done by storing the result of > SSL_get_error() anywhere else, but definitely we need to make a clear > distinction between all these cases and for now the distinction between > the read0, completed connection and other errors is lost. > Because we can't rely on ret == 0 being meaningful. The manpage for OpenSSL 1.1.1 states : Old documentation indicated a difference between 0 and -1, and that -1 was retryable. You should instead call SSL_get_error() to find out if it's retryable. And the switch would lead to more goto, as we need to break from outside the loop :) > I'm also seeing that the whole block could be rearranged so that ret <= 0 > is handled first. That would remove some gotos and would leave the main > part after all error handling. > I'm not sure I get that part. I don't mind one way or another, but I don't understand how it would remove gotos. > BTW this makes me realize that your inverted condition above seems wrong > (|| instead of &&). > Oops, that is true, those things are too complicated. Regards, Olivier >From fca75937f4f2b1927f5c82c137cba292c82dac53 Mon Sep 17 00:00:00 2001 From: Olivier HouchardDate: Tue, 13 Feb 2018 15:17:23 +0100 Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as unrecovarable. SSL_read() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL, without meaning the connection is gone. Before flagging the conection as in error, check the errno value. This should be backported to 1.8. --- src/ssl_sock.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/ssl_sock.c b/src/ssl_sock.c index aee3cd965..b24db079d 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -5434,6 +5434,13 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun break; } else if (ret == SSL_ERROR_ZERO_RETURN) goto read0; + /* For SSL_ERROR_SYSCALL, make sure the error is +* unrecoverable before flagging the connection as +* in error. +*/ + if (ret == SSL_ERROR_SYSCALL && (!errno || errno == + EAGAIN)) + goto clear_ssl_error; /* otherwise it's a real error */ goto out_error; } @@ -5448,11 +5455,12 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun conn_sock_read0(conn); goto leave; out_error: + conn->flags |= CO_FL_ERROR; +clear_ssl_error: /* Clear openssl global errors stack */ ssl_sock_dump_errors(conn); ERR_clear_error(); - conn->flags |= CO_FL_ERROR; goto leave; } -- 2.14.3
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Olivier, On Tue, Feb 13, 2018 at 06:07:36PM +0100, Olivier Houchard wrote: > Hi Emmanuel, > > On Tue, Feb 13, 2018 at 05:40:00PM +0100, Emmanuel Hocdet wrote: > > Hi Olivier > > > > > Le 13 févr. 2018 à 15:27, Olivier Houcharda > > > écrit : > > > > > > Thanks a lot for the detailed analyze, and sorry for the late answer. > > > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an > > > unrecoverable error. > > > So, what you basically did was something equivalent to the patch attached > > > ? > > > > > > Thanks a lot ! > > > > > > Olivier > > > <0001-MINOR-BUG-ssl-Don-t-always-treat-SSL_ERROR_SYSCALL-a.patch> > > > > > > 'ret' can be unassigned > > errno must be check earlier. > > I (mostly) disagree. > There are 3 goto out_error, and on the only one that mattered, ret and > errno should be fine. > However, I rewrote the patch to make it clearer what the intent is. > Do you like it better ? > > Cheers, > > Olivier > >From 8de2963753a33ca948beebcdd70c5f46bd01e4cf Mon Sep 17 00:00:00 2001 > From: Olivier Houchard > Date: Tue, 13 Feb 2018 15:17:23 +0100 > Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as > unrecovarable. > > SSL_Raad() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL, > without meaning the connection is gone. Before flagging the conection > as in error, check the errno value. > > This should be backported to 1.8. > --- > src/ssl_sock.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/src/ssl_sock.c b/src/ssl_sock.c > index aee3cd965..83e9b86e4 100644 > --- a/src/ssl_sock.c > +++ b/src/ssl_sock.c > @@ -5313,6 +5313,7 @@ reneg_ok: > static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int > count) > { > int ret, done = 0; > + int set_error_flag = 1; > int try; > > conn_refresh_polling_flags(conn); > @@ -5434,6 +5435,13 @@ static int ssl_sock_to_buf(struct connection *conn, > struct buffer *buf, int coun > break; > } else if (ret == SSL_ERROR_ZERO_RETURN) > goto read0; > + /* For SSL_ERROR_SYSCALL, make sure the error is > + * unrecoverable before flagging the connection as > + * in error. > + */ > + if (ret == SSL_ERROR_SYSCALL || (!errno || errno == > + EAGAIN)) > + set_error_flag = 0; > /* otherwise it's a real error */ > goto out_error; > } > @@ -5452,7 +5460,8 @@ static int ssl_sock_to_buf(struct connection *conn, > struct buffer *buf, int coun > ssl_sock_dump_errors(conn); > ERR_clear_error(); > > - conn->flags |= CO_FL_ERROR; > + if (set_error_flag) > + conn->flags |= CO_FL_ERROR; > goto leave; > } Such type of construct tends to scare me (probably because I'm not reading the whole code). It means we're supposed to set an error by default unless we pass by a specific path. I fear that we'll get future issues (possibly as the result of further fixes for unrelated stuff). Also I'm really not fond of the fact that the out_error label isn't used anymore to set the error, but sometimes to set it, sometimes to clear it. Why instead not handle it almost similarly to the way it was before the patch that removed the code ? I'm seeing that in the past we used to only differenciate between read0 and error. But if we're failing on SYSCALL with ret previously set to zero, an EOF was observed, so that's akin to a read0 (this information is lost in the recent code since ret is overwritten). May I suggest that we change all this block to a switch/case instead ? It would more or less give something like this : if (ret > 0) { blah; } else switch (SSL_get_error(ctx, ret)) { case SSL_ERROR_WANT_WRITE: blah; break; case SSL_ERROR_WANT_READ: blah; break; case SSL_ERROR_ZERO_RETURN: goto read0; case SSL_ERROR_SYSCALL: if (!ret) goto read0; else if (errno && errno != EAGAIN) goto out_error; else /* valid cases, connection establishment, etc */ goto leave; default: goto out_error; } It's just a suggestion, it can be done by storing the result of SSL_get_error() anywhere else, but definitely we need to make a clear distinction between all these cases and for now the distinction between the read0, completed connection and other errors is lost. I'm also seeing that the whole block could be rearranged so that ret <= 0 is handled first. That would remove some
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Emmanuel, On Tue, Feb 13, 2018 at 05:40:00PM +0100, Emmanuel Hocdet wrote: > Hi Olivier > > > Le 13 févr. 2018 à 15:27, Olivier Houcharda écrit : > > > > Thanks a lot for the detailed analyze, and sorry for the late answer. > > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an > > unrecoverable error. > > So, what you basically did was something equivalent to the patch attached ? > > > > Thanks a lot ! > > > > Olivier > > <0001-MINOR-BUG-ssl-Don-t-always-treat-SSL_ERROR_SYSCALL-a.patch> > > > 'ret' can be unassigned > errno must be check earlier. I (mostly) disagree. There are 3 goto out_error, and on the only one that mattered, ret and errno should be fine. However, I rewrote the patch to make it clearer what the intent is. Do you like it better ? Cheers, Olivier >From 8de2963753a33ca948beebcdd70c5f46bd01e4cf Mon Sep 17 00:00:00 2001 From: Olivier Houchard Date: Tue, 13 Feb 2018 15:17:23 +0100 Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as unrecovarable. SSL_Raad() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL, without meaning the connection is gone. Before flagging the conection as in error, check the errno value. This should be backported to 1.8. --- src/ssl_sock.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/ssl_sock.c b/src/ssl_sock.c index aee3cd965..83e9b86e4 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -5313,6 +5313,7 @@ reneg_ok: static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int count) { int ret, done = 0; + int set_error_flag = 1; int try; conn_refresh_polling_flags(conn); @@ -5434,6 +5435,13 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun break; } else if (ret == SSL_ERROR_ZERO_RETURN) goto read0; + /* For SSL_ERROR_SYSCALL, make sure the error is +* unrecoverable before flagging the connection as +* in error. +*/ + if (ret == SSL_ERROR_SYSCALL || (!errno || errno == + EAGAIN)) + set_error_flag = 0; /* otherwise it's a real error */ goto out_error; } @@ -5452,7 +5460,8 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun ssl_sock_dump_errors(conn); ERR_clear_error(); - conn->flags |= CO_FL_ERROR; + if (set_error_flag) + conn->flags |= CO_FL_ERROR; goto leave; } -- 2.14.3
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Olivier > Le 13 févr. 2018 à 15:27, Olivier Houcharda écrit : > > Thanks a lot for the detailed analyze, and sorry for the late answer. > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an > unrecoverable error. > So, what you basically did was something equivalent to the patch attached ? > > Thanks a lot ! > > Olivier > <0001-MINOR-BUG-ssl-Don-t-always-treat-SSL_ERROR_SYSCALL-a.patch> 'ret' can be unassigned errno must be check earlier. ++ Manu
Re: haproxy 1.8 ssl backend server leads to server session aborts
On Tue, Feb 13, 2018 at 05:29:21PM +0100, Mateusz Malek wrote: > Hi, > > On 13.02.2018 15:27, Olivier Houchard wrote: > > Thanks a lot for the detailed analyze, and sorry for the late answer. > > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an > > unrecoverable error. > > So, what you basically did was something equivalent to the patch attached ? > > Yeah, it looked exactly like this ;) I'm glad I could help. So does this mean I should take the patch as is ? Please just let me know. Thanks, Willy
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi, On 13.02.2018 15:27, Olivier Houchard wrote: > Thanks a lot for the detailed analyze, and sorry for the late answer. > You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an > unrecoverable error. > So, what you basically did was something equivalent to the patch attached ? Yeah, it looked exactly like this ;) I'm glad I could help. Best regards Mateusz Małek
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi guys, On Sat, Feb 10, 2018 at 06:26:42PM +0100, Mateusz Małek wrote: > Hi everyone, > > I've narrowed down my problem down to the same commit as Tomek Gacek - > c2aae74f010f97a3415542fe649198a5d3be1ea8 (MEDIUM: ssl: Handle early data > with OpenSSL 1.1.1), so I guess it may be related. In my case, since upgrade > to 1.8, some responses from some backends (not sure what exactly triggers > the bug) do not have their headers modified (despite http-response > add-header and http-response del-header being set). > > Applying patch part-by-part, I got to a point where it seems that that was > caused by changes to ssl_sock_to_buf function in src/ssl_sock.c (lines > 396-431): > https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L396-L431 > > Code at out_error label behave a bit differently from part removed in this > commit - namely, it sets conn->flags |= CO_FL_ERROR unconditionally, while > previously there was an additional check (skipping error flag setting if > errno was set to EAGAIN). My problems went straight away when I've changed > out_error to match old code. > Thanks a lot for the detailed analyze, and sorry for the late answer. You're probably right, SSL_ERROR_SYSCALL shouldn't be treated as an unrecoverable error. So, what you basically did was something equivalent to the patch attached ? Thanks a lot ! Olivier >From b423f94273be2c7040ce0861bd4a21617b4c5c2b Mon Sep 17 00:00:00 2001 From: Olivier HouchardDate: Tue, 13 Feb 2018 15:17:23 +0100 Subject: [PATCH] MINOR/BUG: ssl: Don't always treat SSL_ERROR_SYSCALL as unrecovarable. SSL_Raad() might return <= 0, and SSL_get_erro() return SSL_ERROR_SYSCALL, without meaning the connection is gone. Before flagging the conection as in error, check the errno value. This should be backported to 1.8. --- src/ssl_sock.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/ssl_sock.c b/src/ssl_sock.c index aee3cd965..687133b0d 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -5452,7 +5452,9 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun ssl_sock_dump_errors(conn); ERR_clear_error(); - conn->flags |= CO_FL_ERROR; + if ((ret != SSL_ERROR_SYSCALL) || + (errno && errno != EAGAIN)) + conn->flags |= CO_FL_ERROR; goto leave; } -- 2.14.3
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Mateusz, I'm CCing Emeric (SSL maintainer) and Olivier (who added early-data support), and responding to some points below. On Sat, Feb 10, 2018 at 06:26:42PM +0100, Mateusz Malek wrote: > Hi everyone, > > I've narrowed down my problem down to the same commit as Tomek Gacek - > c2aae74f010f97a3415542fe649198a5d3be1ea8 (MEDIUM: ssl: Handle early data > with OpenSSL 1.1.1), so I guess it may be related. In my case, since upgrade > to 1.8, some responses from some backends (not sure what exactly triggers > the bug) do not have their headers modified (despite http-response > add-header and http-response del-header being set). > > Applying patch part-by-part, I got to a point where it seems that that was > caused by changes to ssl_sock_to_buf function in src/ssl_sock.c (lines > 396-431): > https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L396-L431 > > Code at out_error label behave a bit differently from part removed in this > commit - namely, it sets conn->flags |= CO_FL_ERROR unconditionally, while > previously there was an additional check (skipping error flag setting if > errno was set to EAGAIN). My problems went straight away when I've changed > out_error to match old code. Interesting, it would be possible that sometimes an SSL error remains on the stack, and that it randomly gets trapped here. It used to happen in the past and we went through a great pain trying to always clean all of them. I have no idea why this specific condition was removed. It's not mentionned at all in Olivier's commit message and suspiciously looks like an accident. Olivier, do you have an idea on this one ? It's commit c2aae74f. It may be related to an issue Emeric was chasing recently where some server-side connections would occasionally fail if my memory serves me right. > There is also another issue with this commit - it seems that one "1" got > lost in OPENSSL_VERSION_NUMBER comparison (line 267): > https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L267 > > Throughout this commit all additions of similar ifdefs use 0x10101000L, > which translates to OpenSSL 1.1.1 - and this one oddly translates to version > 0.1.1. Yes but this one was fixed in 1.8-rc3 by commit cfdef2e3 so it cannot be this one. But the removed part above definitely is an interesting one. > Hope this helps! Sure it does! Thanks! Willy
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi everyone, I've narrowed down my problem down to the same commit as Tomek Gacek - c2aae74f010f97a3415542fe649198a5d3be1ea8 (MEDIUM: ssl: Handle early data with OpenSSL 1.1.1), so I guess it may be related. In my case, since upgrade to 1.8, some responses from some backends (not sure what exactly triggers the bug) do not have their headers modified (despite http-response add-header and http-response del-header being set). Applying patch part-by-part, I got to a point where it seems that that was caused by changes to ssl_sock_to_buf function in src/ssl_sock.c (lines 396-431): https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L396-L431 Code at out_error label behave a bit differently from part removed in this commit - namely, it sets conn->flags |= CO_FL_ERROR unconditionally, while previously there was an additional check (skipping error flag setting if errno was set to EAGAIN). My problems went straight away when I've changed out_error to match old code. There is also another issue with this commit - it seems that one "1" got lost in OPENSSL_VERSION_NUMBER comparison (line 267): https://gist.github.com/mkwm/13dd32fe2b5ec21182f8a06a304228df#file-break-patch-L267 Throughout this commit all additions of similar ifdefs use 0x10101000L, which translates to OpenSSL 1.1.1 - and this one oddly translates to version 0.1.1. Hope this helps! Best regards Mateusz Malek
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Willy On 2018-02-03 10:05, Willy Tarreau wrote: Hi Tomek, On Sat, Feb 03, 2018 at 08:47:35AM +0100, Tomek Gacek wrote: I have same issue. It's pretty random as I would say about 60-70% requests are OK, but rest is failing. I compiled all 1.8 versions and was able to isolate this a little bit. It's fine up to 1.8.0-dev3 branch and it's failing since 1.8.0-rc1. The problem is for SSL connections and after digging in my apps logs it looks like it's related to reusing keep alived connections between client and haproxy. By default I have in config "option http-server-close" present and when it's there I can see the problem. When this option is removed - problem is solved. This is extremely valuable information. I suspect we might have damaged something in the backend code when inserting the mux layer. Given that the connection and stream are a bit more independant from each other now due to the mux, it might be possible that depending the sequence of some events (eg: connection close), it changes how the close event is interpreted in the stream. This will definitely help us narrow down the cause of the issue. My tests show the problem is caused by this commit http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=c2aae74f010f97a3415542fe649198a5d3be1ea8 This is first snapshot were I was able to recreate problem. Unable to recreate in previous commit - http://git.haproxy.org/?p=haproxy-1.8.git;a=commit;h=253c62b257c137e7da5c273f42bc5d6eacd31d2c Regards, Tomek
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi Tomek, On Sat, Feb 03, 2018 at 08:47:35AM +0100, Tomek Gacek wrote: > I have same issue. It's pretty random as I would say about 60-70% requests > are OK, but rest is failing. I compiled all 1.8 versions and was able to > isolate this a little bit. It's fine up to 1.8.0-dev3 branch and it's > failing since 1.8.0-rc1. > The problem is for SSL connections and after digging in my apps logs it > looks like it's related to reusing keep alived connections between client > and haproxy. > By default I have in config "option http-server-close" present and when it's > there I can see the problem. When this option is removed - problem is > solved. This is extremely valuable information. I suspect we might have damaged something in the backend code when inserting the mux layer. Given that the connection and stream are a bit more independant from each other now due to the mux, it might be possible that depending the sequence of some events (eg: connection close), it changes how the close event is interpreted in the stream. This will definitely help us narrow down the cause of the issue. Thanks! Willy
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi On 2018-01-17 11:37, Bart Geesink wrote: Hi, On 01/17/2018 10:16 AM, Christopher Faulet wrote: Le 16/01/2018 à 16:18, Lukas Tribus a écrit : Hello Christopher, On 16 January 2018 at 15:01, Bart Geesinkwrote: Hi, We have an issue in haproxy > 1.8 on CentOS when using SSL in the server configuration. Haproxy sometimes logs a http status code "-1" followed by the termination_state SDxx. This happens every few requests. When using one backend, the clients don't notice it. When using multiple backends, this can result in redirecting traffic to the wrong backend (the proxy inserts a cookie to track which backend is used). Removing the SSL configuration and using plain http solves the issue, as does downgrading to version 1.7. Also see: https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10 Hi Lukas, Thanks, I will check these 2 issues. First of all, I need to reproduce them to be sure. This one seems to be a bit different because the status code is "-1". It also different since it occurs in both 1.8.1 and 1.8.3 Bart, when you said your backend does not log any problems, it means that for a request logged with SD termination state on haproxy, you have a 200 on Apache side ? And what does the response look like from the client side ? (truncated / good / error ...) Apache logs a 200. The reponse looks fine from the client side. The only thing that seems to be missing is the cookie inserted by the proxy for some requests. Other requests seem not to experience this behaviour. There is no real pattern: I can happen when a valid proxy cookie is present, but also when a new browser session is started and a proxy cookie is not yet present. Downgrading to 1.7 helps, as does using a http backend without ssl. Regards, Bart I have same issue. It's pretty random as I would say about 60-70% requests are OK, but rest is failing. I compiled all 1.8 versions and was able to isolate this a little bit. It's fine up to 1.8.0-dev3 branch and it's failing since 1.8.0-rc1. The problem is for SSL connections and after digging in my apps logs it looks like it's related to reusing keep alived connections between client and haproxy. By default I have in config "option http-server-close" present and when it's there I can see the problem. When this option is removed - problem is solved. Regards, Tomek
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hi, On 01/17/2018 10:16 AM, Christopher Faulet wrote: > Le 16/01/2018 à 16:18, Lukas Tribus a écrit : >> Hello Christopher, >> >> >> On 16 January 2018 at 15:01, Bart Geesink>> wrote: >>> Hi, >>> >>> We have an issue in haproxy > 1.8 on CentOS when using SSL in the server >>> configuration. Haproxy sometimes logs a http status code "-1" followed >>> by the termination_state SDxx. This happens every few requests. When >>> using one backend, the clients don't notice it. When using multiple >>> backends, this can result in redirecting traffic to the wrong backend >>> (the proxy inserts a cookie to track which backend is used). >>> >>> Removing the SSL configuration and using plain http solves the issue, as >>> does downgrading to version 1.7. >> >> Also see: >> https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10 >> >> > > Hi Lukas, > > Thanks, I will check these 2 issues. First of all, I need to reproduce > them to be sure. This one seems to be a bit different because the status > code is "-1". > It also different since it occurs in both 1.8.1 and 1.8.3 > Bart, when you said your backend does not log any problems, it means > that for a request logged with SD termination state on haproxy, you have > a 200 on Apache side ? And what does the response look like from the > client side ? (truncated / good / error ...) > Apache logs a 200. The reponse looks fine from the client side. The only thing that seems to be missing is the cookie inserted by the proxy for some requests. Other requests seem not to experience this behaviour. There is no real pattern: I can happen when a valid proxy cookie is present, but also when a new browser session is started and a proxy cookie is not yet present. Downgrading to 1.7 helps, as does using a http backend without ssl. Regards, Bart signature.asc Description: OpenPGP digital signature
Re: haproxy 1.8 ssl backend server leads to server session aborts
Le 16/01/2018 à 16:18, Lukas Tribus a écrit : Hello Christopher, On 16 January 2018 at 15:01, Bart Geesinkwrote: Hi, We have an issue in haproxy > 1.8 on CentOS when using SSL in the server configuration. Haproxy sometimes logs a http status code "-1" followed by the termination_state SDxx. This happens every few requests. When using one backend, the clients don't notice it. When using multiple backends, this can result in redirecting traffic to the wrong backend (the proxy inserts a cookie to track which backend is used). Removing the SSL configuration and using plain http solves the issue, as does downgrading to version 1.7. Also see: https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10 Hi Lukas, Thanks, I will check these 2 issues. First of all, I need to reproduce them to be sure. This one seems to be a bit different because the status code is "-1". Bart, when you said your backend does not log any problems, it means that for a request logged with SD termination state on haproxy, you have a 200 on Apache side ? And what does the response look like from the client side ? (truncated / good / error ...) -- Christopher Faulet
Re: haproxy 1.8 ssl backend server leads to server session aborts
Hello Christopher, On 16 January 2018 at 15:01, Bart Geesinkwrote: > Hi, > > We have an issue in haproxy > 1.8 on CentOS when using SSL in the server > configuration. Haproxy sometimes logs a http status code "-1" followed > by the termination_state SDxx. This happens every few requests. When > using one backend, the clients don't notice it. When using multiple > backends, this can result in redirecting traffic to the wrong backend > (the proxy inserts a cookie to track which backend is used). > > Removing the SSL configuration and using plain http solves the issue, as > does downgrading to version 1.7. Also see: https://discourse.haproxy.org/t/session-state-sdnn-http-status-code-200-when-upgrading-to-1-7/1409/10 cheers, lukas