Re: relayd stops processing traffic intermittently
On Wed, Mar 14, 2018 at 04:27:58PM +0100, Mischa wrote: > Hi Claudio, > > > On 25 Dec 2017, at 15:54, Mischa wrote: > > > >> On 24 Dec 2017, at 19:07, Claudio Jeker wrote: > >> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: > On 23 Dec 2017, at 13:08, Claudio Jeker wrote: > > On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: > > Hi All, > > > > Since OpenBSD 6.2, just confirmed this in the latest snapshot > > (GENERIC.MP#305) as well, for some reason relayd stops processing > > traffic and starts flooding the log file with the following message: > > > > Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout > > Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout > > Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout > > Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout > > [snip] > > Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout > > Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout > > Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout > > Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout > > ...etc... > > > > Restarting the daemon "fixes" the problem. > > Not sure how to trouble shoot this but I am able to reproduce this > > consistently by pointing SSLLabs towards relayd. > > Would be great to get some pointers. > > > > I have seen this as well on our production systems. This is a problem in > the privsep part of the TLS code. I could not do more testing yet but my > assumption is that a new option / feature is freaking this code out. > >>> > >>> Anything I can do or collect to give you more information? > >> > >> So, I think I found the problem. The ca process did not handle errors from > >> RSA_private_encrypt correctly. So once you got a bad signature in the > >> system chocked and stopped. This diff seems to work for me (against > >> SSLlabs). > > > > Awesome! Can confirm that it continues processing traffic when hitting it > > with sslabs. > > Will also move it to a more bussier machine to see how that handles. > > > > I am seeing the following messages now: > > Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa > > routines:CRYPTO_internal:block type is not 02 > > Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa > > routines:CRYPTO_internal:null before block missing > > Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa > > routines:CRYPTO_internal:padding check failed > > Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa > > routines:CRYPTO_internal:null before block missing > > Not sure if this is supposed to be taken care of, but I am still seeing the > following messages in 6.3-beta. > $ uname -a > OpenBSD lb2l 6.3 GENERIC.MP#58 amd64 > > Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa > routines:CRYPTO_internal:block type is not 02 > Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa > routines:CRYPTO_internal:padding check failed > Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa > routines:CRYPTO_internal:block type is not 02 > Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa > routines:CRYPTO_internal:null before block missing > > Any knobs that need to be turned? Unsure. The errors are from OpenSSL and explain why the key exchange blew up. Currently this resets the connection. I wanted to log them so they are more visible. Somebody with deep knowledge of TLS and libcrypto may be able to answer if those are harmless or if somebody is probing. -- :wq Claudio
Re: relayd stops processing traffic intermittently
Hi Tom! Before, relayd would just "crash" and a reload was "solving" the problem. In this instance it's luckily just extra logging. ;) Mischa > On 14 Mar 2018, at 16:47, Tom Smyth wrote: > > Hi all > I have had that issue in 6.0,6.1 and 6.2 I haven't tried current yet.. > I haven't had enough time to diagnose it to provide an adequate > bug report myself. > just restarting relayd seems to resolve it > Im just confiming that I have seen this issue also ... > > On 14 March 2018 at 15:27, Mischa wrote: >> Hi Claudio, >> >>> On 25 Dec 2017, at 15:54, Mischa wrote: >>> On 24 Dec 2017, at 19:07, Claudio Jeker wrote: On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: >> On 23 Dec 2017, at 13:08, Claudio Jeker wrote: >>> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: >>> Hi All, >>> >>> Since OpenBSD 6.2, just confirmed this in the latest snapshot >>> (GENERIC.MP#305) as well, for some reason relayd stops processing >>> traffic and starts flooding the log file with the following message: >>> >>> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout >>> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout >>> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout >>> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout >>> [snip] >>> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout >>> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout >>> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout >>> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout >>> ...etc... >>> >>> Restarting the daemon "fixes" the problem. >>> Not sure how to trouble shoot this but I am able to reproduce this >>> consistently by pointing SSLLabs towards relayd. >>> Would be great to get some pointers. >>> >> >> I have seen this as well on our production systems. This is a problem in >> the privsep part of the TLS code. I could not do more testing yet but my >> assumption is that a new option / feature is freaking this code out. > > Anything I can do or collect to give you more information? So, I think I found the problem. The ca process did not handle errors from RSA_private_encrypt correctly. So once you got a bad signature in the system chocked and stopped. This diff seems to work for me (against SSLlabs). >>> >>> Awesome! Can confirm that it continues processing traffic when hitting it >>> with sslabs. >>> Will also move it to a more bussier machine to see how that handles. >>> >>> I am seeing the following messages now: >>> Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa >>> routines:CRYPTO_internal:block type is not 02 >>> Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa >>> routines:CRYPTO_internal:null before block missing >>> Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa >>> routines:CRYPTO_internal:padding check failed >>> Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa >>> routines:CRYPTO_internal:null before block missing >> >> Not sure if this is supposed to be taken care of, but I am still seeing the >> following messages in 6.3-beta. >> $ uname -a >> OpenBSD lb2l 6.3 GENERIC.MP#58 amd64 >> >> Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa >> routines:CRYPTO_internal:block type is not 02 >> Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa >> routines:CRYPTO_internal:padding check failed >> Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa >> routines:CRYPTO_internal:block type is not 02 >> Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa >> routines:CRYPTO_internal:null before block missing >> >> Any knobs that need to be turned? >> >> Mischa >> > > > > -- > Kindest regards, > Tom Smyth > > Mobile: +353 87 6193172 > The information contained in this E-mail is intended only for the > confidential use of the named recipient. If the reader of this message > is not the intended recipient or the person responsible for > delivering it to the recipient, you are hereby notified that you have > received this communication in error and that any review, > dissemination or copying of this communication is strictly prohibited. > If you have received this in error, please notify the sender > immediately by telephone at the number above and erase the message > You are requested to carry out your own virus check before > opening any attachment.
Re: relayd stops processing traffic intermittently
Hi all I have had that issue in 6.0,6.1 and 6.2 I haven't tried current yet.. I haven't had enough time to diagnose it to provide an adequate bug report myself. just restarting relayd seems to resolve it Im just confiming that I have seen this issue also ... On 14 March 2018 at 15:27, Mischa wrote: > Hi Claudio, > >> On 25 Dec 2017, at 15:54, Mischa wrote: >> >>> On 24 Dec 2017, at 19:07, Claudio Jeker wrote: >>> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: > On 23 Dec 2017, at 13:08, Claudio Jeker wrote: >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: >> Hi All, >> >> Since OpenBSD 6.2, just confirmed this in the latest snapshot >> (GENERIC.MP#305) as well, for some reason relayd stops processing >> traffic and starts flooding the log file with the following message: >> >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout >> [snip] >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout >> ...etc... >> >> Restarting the daemon "fixes" the problem. >> Not sure how to trouble shoot this but I am able to reproduce this >> consistently by pointing SSLLabs towards relayd. >> Would be great to get some pointers. >> > > I have seen this as well on our production systems. This is a problem in > the privsep part of the TLS code. I could not do more testing yet but my > assumption is that a new option / feature is freaking this code out. Anything I can do or collect to give you more information? >>> >>> So, I think I found the problem. The ca process did not handle errors from >>> RSA_private_encrypt correctly. So once you got a bad signature in the >>> system chocked and stopped. This diff seems to work for me (against >>> SSLlabs). >> >> Awesome! Can confirm that it continues processing traffic when hitting it >> with sslabs. >> Will also move it to a more bussier machine to see how that handles. >> >> I am seeing the following messages now: >> Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa >> routines:CRYPTO_internal:block type is not 02 >> Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa >> routines:CRYPTO_internal:null before block missing >> Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa >> routines:CRYPTO_internal:padding check failed >> Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa >> routines:CRYPTO_internal:null before block missing > > Not sure if this is supposed to be taken care of, but I am still seeing the > following messages in 6.3-beta. > $ uname -a > OpenBSD lb2l 6.3 GENERIC.MP#58 amd64 > > Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa > routines:CRYPTO_internal:block type is not 02 > Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa > routines:CRYPTO_internal:padding check failed > Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa > routines:CRYPTO_internal:block type is not 02 > Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa > routines:CRYPTO_internal:null before block missing > > Any knobs that need to be turned? > > Mischa > -- Kindest regards, Tom Smyth Mobile: +353 87 6193172 The information contained in this E-mail is intended only for the confidential use of the named recipient. If the reader of this message is not the intended recipient or the person responsible for delivering it to the recipient, you are hereby notified that you have received this communication in error and that any review, dissemination or copying of this communication is strictly prohibited. If you have received this in error, please notify the sender immediately by telephone at the number above and erase the message You are requested to carry out your own virus check before opening any attachment.
Re: relayd stops processing traffic intermittently
Hi Claudio, > On 25 Dec 2017, at 15:54, Mischa wrote: > >> On 24 Dec 2017, at 19:07, Claudio Jeker wrote: >> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: On 23 Dec 2017, at 13:08, Claudio Jeker wrote: > On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: > Hi All, > > Since OpenBSD 6.2, just confirmed this in the latest snapshot > (GENERIC.MP#305) as well, for some reason relayd stops processing traffic > and starts flooding the log file with the following message: > > Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout > Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout > Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout > Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout > [snip] > Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout > Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout > Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout > Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout > ...etc... > > Restarting the daemon "fixes" the problem. > Not sure how to trouble shoot this but I am able to reproduce this > consistently by pointing SSLLabs towards relayd. > Would be great to get some pointers. > I have seen this as well on our production systems. This is a problem in the privsep part of the TLS code. I could not do more testing yet but my assumption is that a new option / feature is freaking this code out. >>> >>> Anything I can do or collect to give you more information? >> >> So, I think I found the problem. The ca process did not handle errors from >> RSA_private_encrypt correctly. So once you got a bad signature in the >> system chocked and stopped. This diff seems to work for me (against >> SSLlabs). > > Awesome! Can confirm that it continues processing traffic when hitting it > with sslabs. > Will also move it to a more bussier machine to see how that handles. > > I am seeing the following messages now: > Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa > routines:CRYPTO_internal:block type is not 02 > Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa > routines:CRYPTO_internal:null before block missing > Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa > routines:CRYPTO_internal:padding check failed > Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa > routines:CRYPTO_internal:null before block missing Not sure if this is supposed to be taken care of, but I am still seeing the following messages in 6.3-beta. $ uname -a OpenBSD lb2l 6.3 GENERIC.MP#58 amd64 Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02 Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02 Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing Any knobs that need to be turned? Mischa
Re: relayd stops processing traffic intermittently
> On 24 Dec 2017, at 19:07, Claudio Jeker wrote: > On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: >>> On 23 Dec 2017, at 13:08, Claudio Jeker wrote: On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: Hi All, Since OpenBSD 6.2, just confirmed this in the latest snapshot (GENERIC.MP#305) as well, for some reason relayd stops processing traffic and starts flooding the log file with the following message: Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout [snip] Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout ...etc... Restarting the daemon "fixes" the problem. Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd. Would be great to get some pointers. >>> >>> I have seen this as well on our production systems. This is a problem in >>> the privsep part of the TLS code. I could not do more testing yet but my >>> assumption is that a new option / feature is freaking this code out. >> >> Anything I can do or collect to give you more information? > > So, I think I found the problem. The ca process did not handle errors from > RSA_private_encrypt correctly. So once you got a bad signature in the > system chocked and stopped. This diff seems to work for me (against > SSLlabs). Awesome! Can confirm that it continues processing traffic when hitting it with sslabs. Will also move it to a more bussier machine to see how that handles. I am seeing the following messages now: Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02 Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing Mischa > > Cheers > -- > :wq Claudio > > Index: ca.c > === > RCS file: /cvs/src/usr.sbin/relayd/ca.c,v > retrieving revision 1.31 > diff -u -p -r1.31 ca.c > --- ca.c 28 Nov 2017 00:20:23 - 1.31 > +++ ca.c 24 Dec 2017 18:01:20 - > @@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep > break; > } > > + if (cko.cko_tlen == -1) { > + char buf[256]; > + log_warnx("%s: %s", __func__, > + ERR_error_string(ERR_get_error(), buf)); > + } > + > iov[c].iov_base = &cko; > iov[c++].iov_len = sizeof(cko); > - if (cko.cko_tlen) { > + if (cko.cko_tlen > 0) { > iov[c].iov_base = to; > iov[c++].iov_len = cko.cko_tlen; > } > @@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f > > IMSG_SIZE_CHECK(&imsg, (&cko)); > memcpy(&cko, imsg.data, sizeof(cko)); > - if (IMSG_DATA_SIZE(&imsg) != > - (sizeof(cko) + cko.cko_tlen)) > - fatalx("data size"); > > ret = cko.cko_tlen; > - if (ret) { > + if (ret > 0) { > + if (IMSG_DATA_SIZE(&imsg) != > + (sizeof(cko) + ret)) > + fatalx("data size"); > toptr = (u_char *)imsg.data + sizeof(cko); > memcpy(to, toptr, ret); > } >
Re: relayd stops processing traffic intermittently
On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: > > > On 23 Dec 2017, at 13:08, Claudio Jeker wrote: > > > >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: > >> Hi All, > >> > >> Since OpenBSD 6.2, just confirmed this in the latest snapshot > >> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic > >> and starts flooding the log file with the following message: > >> > >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout > >> [snip] > >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout > >> ...etc... > >> > >> Restarting the daemon "fixes" the problem. > >> Not sure how to trouble shoot this but I am able to reproduce this > >> consistently by pointing SSLLabs towards relayd. > >> Would be great to get some pointers. > >> > > > > I have seen this as well on our production systems. This is a problem in > > the privsep part of the TLS code. I could not do more testing yet but my > > assumption is that a new option / feature is freaking this code out. > > Anything I can do or collect to give you more information? So, I think I found the problem. The ca process did not handle errors from RSA_private_encrypt correctly. So once you got a bad signature in the system chocked and stopped. This diff seems to work for me (against SSLlabs). Cheers -- :wq Claudio Index: ca.c === RCS file: /cvs/src/usr.sbin/relayd/ca.c,v retrieving revision 1.31 diff -u -p -r1.31 ca.c --- ca.c28 Nov 2017 00:20:23 - 1.31 +++ ca.c24 Dec 2017 18:01:20 - @@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep break; } + if (cko.cko_tlen == -1) { + char buf[256]; + log_warnx("%s: %s", __func__, + ERR_error_string(ERR_get_error(), buf)); + } + iov[c].iov_base = &cko; iov[c++].iov_len = sizeof(cko); - if (cko.cko_tlen) { + if (cko.cko_tlen > 0) { iov[c].iov_base = to; iov[c++].iov_len = cko.cko_tlen; } @@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f IMSG_SIZE_CHECK(&imsg, (&cko)); memcpy(&cko, imsg.data, sizeof(cko)); - if (IMSG_DATA_SIZE(&imsg) != - (sizeof(cko) + cko.cko_tlen)) - fatalx("data size"); ret = cko.cko_tlen; - if (ret) { + if (ret > 0) { + if (IMSG_DATA_SIZE(&imsg) != + (sizeof(cko) + ret)) + fatalx("data size"); toptr = (u_char *)imsg.data + sizeof(cko); memcpy(to, toptr, ret); }
Re: relayd stops processing traffic intermittently
On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: > > > On 23 Dec 2017, at 13:08, Claudio Jeker wrote: > > > >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: > >> Hi All, > >> > >> Since OpenBSD 6.2, just confirmed this in the latest snapshot > >> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic > >> and starts flooding the log file with the following message: > >> > >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout > >> [snip] > >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout > >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout > >> ...etc... > >> > >> Restarting the daemon "fixes" the problem. > >> Not sure how to trouble shoot this but I am able to reproduce this > >> consistently by pointing SSLLabs towards relayd. > >> Would be great to get some pointers. > >> > > > > I have seen this as well on our production systems. This is a problem in > > the privsep part of the TLS code. I could not do more testing yet but my > > assumption is that a new option / feature is freaking this code out. > > Anything I can do or collect to give you more information? Your tip with SSLLabs is hopefully good enough to produce it at will. I will try to fix this in the next days. Keep you posted. -- :wq Claudio
Re: relayd stops processing traffic intermittently
> On 23 Dec 2017, at 13:08, Claudio Jeker wrote: > >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: >> Hi All, >> >> Since OpenBSD 6.2, just confirmed this in the latest snapshot >> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic >> and starts flooding the log file with the following message: >> >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout >> [snip] >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout >> ...etc... >> >> Restarting the daemon "fixes" the problem. >> Not sure how to trouble shoot this but I am able to reproduce this >> consistently by pointing SSLLabs towards relayd. >> Would be great to get some pointers. >> > > I have seen this as well on our production systems. This is a problem in > the privsep part of the TLS code. I could not do more testing yet but my > assumption is that a new option / feature is freaking this code out. Anything I can do or collect to give you more information? Mischa
Re: relayd stops processing traffic intermittently
On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: > Hi All, > > Since OpenBSD 6.2, just confirmed this in the latest snapshot > (GENERIC.MP#305) as well, for some reason relayd stops processing traffic and > starts flooding the log file with the following message: > > Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout > Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout > Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout > Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout > [snip] > Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout > Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout > Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout > Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout > ...etc... > > Restarting the daemon "fixes" the problem. > Not sure how to trouble shoot this but I am able to reproduce this > consistently by pointing SSLLabs towards relayd. > Would be great to get some pointers. > I have seen this as well on our production systems. This is a problem in the privsep part of the TLS code. I could not do more testing yet but my assumption is that a new option / feature is freaking this code out. -- :wq Claudio > Anonymised config below: > # relayd.conf > local_v4 = "xxx" > local_v6 = "xxx" > table { 127.0.0.1 } > > www1_addr_v4 = "xxx" > www1_addr_v6 = "xxx" > table { xxx } > > www3_addr_v4 = "xxx" > www3_addr_v6 = "xxx" > table { xxx } > > cust1_addr_v4 = "xxx" > cust1_addr_v6 = "xxx" > > cust2_addr_v4 = "xxx" > cust3_addr_v4 = "xxx" > cust4_addr_v4 = "xxx" > table { xxx } > table { xxx } > table { xxx } > > cust5_addr_v4 = "xxx" > table { xxx } > > http protocol httpfilter_default { > match request header remove "Proxy" > match request header append "X-Forwarded-For" value "$REMOTE_ADDR" > match request header append "X-Forwarded-By" value > "$SERVER_ADDR:$SERVER_PORT" > match response header set "Server" value "Sever" > match response header set "X-Powered-By" value "Power" > tcp { no splice } > } > http protocol httpsfilter_default { > match request header remove "Proxy" > match request header set "X-ClientIP" value "$REMOTE_ADDR" > match request header append "X-Forwarded-For" value "$REMOTE_ADDR" > match request header append "X-Forwarded-By" value > "$SERVER_ADDR:$SERVER_PORT" > match response header set "Strict-Transport-Security" value > "max-age=31536000" > match response header set "Server" value "Sever" > match response header set "X-Powered-By" value "Power" > match request quick header "Host" value "images.webcam.nl" forward to > > tcp { no splice } > tls { no client-renegotiation } > } > > http protocol httpfilter { > match request header remove "Proxy" > match request header append "X-Forwarded-For" value "$REMOTE_ADDR" > match request header append "X-Forwarded-By" value > "$SERVER_ADDR:$SERVER_PORT" > match response header set "Content-Security-Policy" value > "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; > style-src 'self' 'unsafe-inline'; img-src 'self'" > match response header set "Server" value "Sever" > match response header set "X-Powered-By" value "Power" > match response header set "X-Frame-Options" value "SAMEORIGIN" > match response header set "X-Xss-Protection" value "1; mode=block" > match response header set "X-Content-Type-Options" value "nosniff" > match request quick header "Host" value "*xxx*" forward to > match request quick header "Host" value "*xxx*" forward to > tcp { no splice } > } > http protocol httpsfilter { > return error > match request header remove "Proxy" > match request header set "X-ClientIP" value "$REMOTE_ADDR" > match request header append "X-Forwarded-For" value "$REMOTE_ADDR" > match request header append "X-Forwarded-By" value > "$SERVER_ADDR:$SERVER_PORT" > match response header set "Strict-Transport-Security" value > "max-age=31536000" > match response header set "Content-Security-Policy" value > "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; > style-src 'self' 'unsafe-inline'; img-src 'self'" > match response header set "Server" value "Sever" > match response header set "X-Powered-By" value "Power" > match response header set "X-Frame-Options" value "SAMEORIGIN" > match response header set "X-Xss-Protection" value "1; mode=block" > match response header set "X-Content-Type-Options" value "nosniff" > tcp { no splice } > tls { no client-renegotiation } > } > relay default { > listen on $local_v4 port 80 > listen on $local_v6 port 80 > protocol httpfilter_default > forward
relayd stops processing traffic intermittently
Hi All, Since OpenBSD 6.2, just confirmed this in the latest snapshot (GENERIC.MP#305) as well, for some reason relayd stops processing traffic and starts flooding the log file with the following message: Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout [snip] Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout ...etc... Restarting the daemon "fixes" the problem. Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd. Would be great to get some pointers. Anonymised config below: # relayd.conf local_v4 = "xxx" local_v6 = "xxx" table { 127.0.0.1 } www1_addr_v4 = "xxx" www1_addr_v6 = "xxx" table { xxx } www3_addr_v4 = "xxx" www3_addr_v6 = "xxx" table { xxx } cust1_addr_v4 = "xxx" cust1_addr_v6 = "xxx" cust2_addr_v4 = "xxx" cust3_addr_v4 = "xxx" cust4_addr_v4 = "xxx" table { xxx } table { xxx } table { xxx } cust5_addr_v4 = "xxx" table { xxx } http protocol httpfilter_default { match request header remove "Proxy" match request header append "X-Forwarded-For" value "$REMOTE_ADDR" match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" match response header set "Server" value "Sever" match response header set "X-Powered-By" value "Power" tcp { no splice } } http protocol httpsfilter_default { match request header remove "Proxy" match request header set "X-ClientIP" value "$REMOTE_ADDR" match request header append "X-Forwarded-For" value "$REMOTE_ADDR" match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" match response header set "Strict-Transport-Security" value "max-age=31536000" match response header set "Server" value "Sever" match response header set "X-Powered-By" value "Power" match request quick header "Host" value "images.webcam.nl" forward to tcp { no splice } tls { no client-renegotiation } } http protocol httpfilter { match request header remove "Proxy" match request header append "X-Forwarded-For" value "$REMOTE_ADDR" match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" match response header set "Content-Security-Policy" value "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 'unsafe-inline'; img-src 'self'" match response header set "Server" value "Sever" match response header set "X-Powered-By" value "Power" match response header set "X-Frame-Options" value "SAMEORIGIN" match response header set "X-Xss-Protection" value "1; mode=block" match response header set "X-Content-Type-Options" value "nosniff" match request quick header "Host" value "*xxx*" forward to match request quick header "Host" value "*xxx*" forward to tcp { no splice } } http protocol httpsfilter { return error match request header remove "Proxy" match request header set "X-ClientIP" value "$REMOTE_ADDR" match request header append "X-Forwarded-For" value "$REMOTE_ADDR" match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" match response header set "Strict-Transport-Security" value "max-age=31536000" match response header set "Content-Security-Policy" value "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 'unsafe-inline'; img-src 'self'" match response header set "Server" value "Sever" match response header set "X-Powered-By" value "Power" match response header set "X-Frame-Options" value "SAMEORIGIN" match response header set "X-Xss-Protection" value "1; mode=block" match response header set "X-Content-Type-Options" value "nosniff" tcp { no splice } tls { no client-renegotiation } } relay default { listen on $local_v4 port 80 listen on $local_v6 port 80 protocol httpfilter_default forward to port 8080 } relay default_redirect { listen on $cust2_addr_v4 port 80 listen on $cust3_addr_v4 port 80 listen on $cust4_addr_v4 port 80 listen on $cust5_addr_v4 port 80 listen on $cust1_addr_v4 port 80 listen on $cust1_addr_v6 port 80 protocol httpfilter_default forward to port 8081 } relay default_redirect_tls { listen on $cust4_addr_v4 port 443 tls protocol httpsfilter_default forward to port 8081 } relay www1 { li