Re: relayd stops processing traffic intermittently

2018-03-14 Thread Claudio Jeker
On Wed, Mar 14, 2018 at 04:27:58PM +0100, Mischa wrote:
> Hi Claudio,
> 
> > On 25 Dec 2017, at 15:54, Mischa  wrote:
> > 
> >> On 24 Dec 2017, at 19:07, Claudio Jeker  wrote:
> >> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
>  On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
> > On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
> > Hi All,
> > 
> > Since OpenBSD 6.2, just confirmed this in the latest snapshot 
> > (GENERIC.MP#305) as well, for some reason relayd stops processing 
> > traffic and starts flooding the log file with the following message:
> > 
> > Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> > Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> > Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> > Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> > [snip]
> > Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> > Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> > Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> > Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> > ...etc...
> > 
> > Restarting the daemon "fixes" the problem.
> > Not sure how to trouble shoot this but I am able to reproduce this 
> > consistently by pointing SSLLabs towards relayd.
> > Would be great to get some pointers.
> > 
>  
>  I have seen this as well on our production systems. This is a problem in
>  the privsep part of the TLS code. I could not do more testing yet but my
>  assumption is that a new option / feature is freaking this code out.
> >>> 
> >>> Anything I can do or collect to give you more information? 
> >> 
> >> So, I think I found the problem. The ca process did not handle errors from
> >> RSA_private_encrypt correctly. So once you got a bad signature in the
> >> system chocked and stopped. This diff seems to work for me (against
> >> SSLlabs).
> > 
> > Awesome! Can confirm that it continues processing traffic when hitting it 
> > with sslabs.
> > Will also move it to a more bussier machine to see how that handles.
> > 
> > I am seeing the following messages now:
> > Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa 
> > routines:CRYPTO_internal:block type is not 02
> > Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa 
> > routines:CRYPTO_internal:null before block missing
> > Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa 
> > routines:CRYPTO_internal:padding check failed
> > Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa 
> > routines:CRYPTO_internal:null before block missing
> 
> Not sure if this is supposed to be taken care of, but I am still seeing the 
> following messages in 6.3-beta.
> $ uname -a
> OpenBSD lb2l 6.3 GENERIC.MP#58 amd64
> 
> Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa 
> routines:CRYPTO_internal:block type is not 02
> Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa 
> routines:CRYPTO_internal:padding check failed
> Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa 
> routines:CRYPTO_internal:block type is not 02
> Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa 
> routines:CRYPTO_internal:null before block missing
> 
> Any knobs that need to be turned?

Unsure. The errors are from OpenSSL and explain why the key exchange blew
up. Currently this resets the connection.
I wanted to log them so they are more visible. Somebody with deep
knowledge of TLS and libcrypto may be able to answer if those are harmless
or if somebody is probing.

-- 
:wq Claudio



Re: relayd stops processing traffic intermittently

2018-03-14 Thread Mischa
Hi Tom!

Before, relayd would just "crash" and a reload was "solving" the problem.
In this instance it's luckily just extra logging. ;)

Mischa

> On 14 Mar 2018, at 16:47, Tom Smyth  wrote:
> 
> Hi all
> I have had that issue  in 6.0,6.1 and 6.2 I haven't tried current yet..
> I haven't had enough time to diagnose it to provide an adequate
> bug report myself.
> just restarting relayd seems to resolve it
> Im just confiming that I have seen this issue also ...
> 
> On 14 March 2018 at 15:27, Mischa  wrote:
>> Hi Claudio,
>> 
>>> On 25 Dec 2017, at 15:54, Mischa  wrote:
>>> 
 On 24 Dec 2017, at 19:07, Claudio Jeker  wrote:
 On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
>> On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
>>> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
>>> Hi All,
>>> 
>>> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
>>> (GENERIC.MP#305) as well, for some reason relayd stops processing 
>>> traffic and starts flooding the log file with the following message:
>>> 
>>> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>>> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
>>> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>>> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>>> [snip]
>>> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>>> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>>> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
>>> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>>> ...etc...
>>> 
>>> Restarting the daemon "fixes" the problem.
>>> Not sure how to trouble shoot this but I am able to reproduce this 
>>> consistently by pointing SSLLabs towards relayd.
>>> Would be great to get some pointers.
>>> 
>> 
>> I have seen this as well on our production systems. This is a problem in
>> the privsep part of the TLS code. I could not do more testing yet but my
>> assumption is that a new option / feature is freaking this code out.
> 
> Anything I can do or collect to give you more information?
 
 So, I think I found the problem. The ca process did not handle errors from
 RSA_private_encrypt correctly. So once you got a bad signature in the
 system chocked and stopped. This diff seems to work for me (against
 SSLlabs).
>>> 
>>> Awesome! Can confirm that it continues processing traffic when hitting it 
>>> with sslabs.
>>> Will also move it to a more bussier machine to see how that handles.
>>> 
>>> I am seeing the following messages now:
>>> Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa 
>>> routines:CRYPTO_internal:block type is not 02
>>> Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa 
>>> routines:CRYPTO_internal:null before block missing
>>> Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa 
>>> routines:CRYPTO_internal:padding check failed
>>> Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa 
>>> routines:CRYPTO_internal:null before block missing
>> 
>> Not sure if this is supposed to be taken care of, but I am still seeing the 
>> following messages in 6.3-beta.
>> $ uname -a
>> OpenBSD lb2l 6.3 GENERIC.MP#58 amd64
>> 
>> Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa 
>> routines:CRYPTO_internal:block type is not 02
>> Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa 
>> routines:CRYPTO_internal:padding check failed
>> Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa 
>> routines:CRYPTO_internal:block type is not 02
>> Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa 
>> routines:CRYPTO_internal:null before block missing
>> 
>> Any knobs that need to be turned?
>> 
>> Mischa
>> 
> 
> 
> 
> -- 
> Kindest regards,
> Tom Smyth
> 
> Mobile: +353 87 6193172
> The information contained in this E-mail is intended only for the
> confidential use of the named recipient. If the reader of this message
> is not the intended recipient or the person responsible for
> delivering it to the recipient, you are hereby notified that you have
> received this communication in error and that any review,
> dissemination or copying of this communication is strictly prohibited.
> If you have received this in error, please notify the sender
> immediately by telephone at the number above and erase the message
> You are requested to carry out your own virus check before
> opening any attachment.



Re: relayd stops processing traffic intermittently

2018-03-14 Thread Tom Smyth
Hi all
I have had that issue  in 6.0,6.1 and 6.2 I haven't tried current yet..
I haven't had enough time to diagnose it to provide an adequate
bug report myself.
just restarting relayd seems to resolve it
Im just confiming that I have seen this issue also ...

On 14 March 2018 at 15:27, Mischa  wrote:
> Hi Claudio,
>
>> On 25 Dec 2017, at 15:54, Mischa  wrote:
>>
>>> On 24 Dec 2017, at 19:07, Claudio Jeker  wrote:
>>> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
> On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
>> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
>> Hi All,
>>
>> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
>> (GENERIC.MP#305) as well, for some reason relayd stops processing 
>> traffic and starts flooding the log file with the following message:
>>
>> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>> [snip]
>> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>> ...etc...
>>
>> Restarting the daemon "fixes" the problem.
>> Not sure how to trouble shoot this but I am able to reproduce this 
>> consistently by pointing SSLLabs towards relayd.
>> Would be great to get some pointers.
>>
>
> I have seen this as well on our production systems. This is a problem in
> the privsep part of the TLS code. I could not do more testing yet but my
> assumption is that a new option / feature is freaking this code out.

 Anything I can do or collect to give you more information?
>>>
>>> So, I think I found the problem. The ca process did not handle errors from
>>> RSA_private_encrypt correctly. So once you got a bad signature in the
>>> system chocked and stopped. This diff seems to work for me (against
>>> SSLlabs).
>>
>> Awesome! Can confirm that it continues processing traffic when hitting it 
>> with sslabs.
>> Will also move it to a more bussier machine to see how that handles.
>>
>> I am seeing the following messages now:
>> Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa 
>> routines:CRYPTO_internal:block type is not 02
>> Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa 
>> routines:CRYPTO_internal:null before block missing
>> Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa 
>> routines:CRYPTO_internal:padding check failed
>> Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa 
>> routines:CRYPTO_internal:null before block missing
>
> Not sure if this is supposed to be taken care of, but I am still seeing the 
> following messages in 6.3-beta.
> $ uname -a
> OpenBSD lb2l 6.3 GENERIC.MP#58 amd64
>
> Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa 
> routines:CRYPTO_internal:block type is not 02
> Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa 
> routines:CRYPTO_internal:padding check failed
> Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa 
> routines:CRYPTO_internal:block type is not 02
> Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa 
> routines:CRYPTO_internal:null before block missing
>
> Any knobs that need to be turned?
>
> Mischa
>



-- 
Kindest regards,
Tom Smyth

Mobile: +353 87 6193172
The information contained in this E-mail is intended only for the
confidential use of the named recipient. If the reader of this message
is not the intended recipient or the person responsible for
delivering it to the recipient, you are hereby notified that you have
received this communication in error and that any review,
dissemination or copying of this communication is strictly prohibited.
If you have received this in error, please notify the sender
immediately by telephone at the number above and erase the message
You are requested to carry out your own virus check before
opening any attachment.



Re: relayd stops processing traffic intermittently

2018-03-14 Thread Mischa
Hi Claudio,

> On 25 Dec 2017, at 15:54, Mischa  wrote:
> 
>> On 24 Dec 2017, at 19:07, Claudio Jeker  wrote:
>> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
 On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
> Hi All,
> 
> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic 
> and starts flooding the log file with the following message:
> 
> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> [snip]
> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> ...etc...
> 
> Restarting the daemon "fixes" the problem.
> Not sure how to trouble shoot this but I am able to reproduce this 
> consistently by pointing SSLLabs towards relayd.
> Would be great to get some pointers.
> 
 
 I have seen this as well on our production systems. This is a problem in
 the privsep part of the TLS code. I could not do more testing yet but my
 assumption is that a new option / feature is freaking this code out.
>>> 
>>> Anything I can do or collect to give you more information? 
>> 
>> So, I think I found the problem. The ca process did not handle errors from
>> RSA_private_encrypt correctly. So once you got a bad signature in the
>> system chocked and stopped. This diff seems to work for me (against
>> SSLlabs).
> 
> Awesome! Can confirm that it continues processing traffic when hitting it 
> with sslabs.
> Will also move it to a more bussier machine to see how that handles.
> 
> I am seeing the following messages now:
> Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa 
> routines:CRYPTO_internal:block type is not 02
> Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa 
> routines:CRYPTO_internal:null before block missing
> Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa 
> routines:CRYPTO_internal:padding check failed
> Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa 
> routines:CRYPTO_internal:null before block missing

Not sure if this is supposed to be taken care of, but I am still seeing the 
following messages in 6.3-beta.
$ uname -a
OpenBSD lb2l 6.3 GENERIC.MP#58 amd64

Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa 
routines:CRYPTO_internal:block type is not 02
Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa 
routines:CRYPTO_internal:padding check failed
Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa 
routines:CRYPTO_internal:block type is not 02
Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa 
routines:CRYPTO_internal:null before block missing

Any knobs that need to be turned?

Mischa



Re: relayd stops processing traffic intermittently

2017-12-25 Thread Mischa
> On 24 Dec 2017, at 19:07, Claudio Jeker  wrote:
> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
>>> On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
 On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
 Hi All,
 
 Since OpenBSD 6.2, just confirmed this in the latest snapshot 
 (GENERIC.MP#305) as well, for some reason relayd stops processing traffic 
 and starts flooding the log file with the following message:
 
 Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
 Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
 Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
 Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
 [snip]
 Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
 Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
 Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
 Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
 ...etc...
 
 Restarting the daemon "fixes" the problem.
 Not sure how to trouble shoot this but I am able to reproduce this 
 consistently by pointing SSLLabs towards relayd.
 Would be great to get some pointers.
 
>>> 
>>> I have seen this as well on our production systems. This is a problem in
>>> the privsep part of the TLS code. I could not do more testing yet but my
>>> assumption is that a new option / feature is freaking this code out.
>> 
>> Anything I can do or collect to give you more information? 
> 
> So, I think I found the problem. The ca process did not handle errors from
> RSA_private_encrypt correctly. So once you got a bad signature in the
> system chocked and stopped. This diff seems to work for me (against
> SSLlabs).

Awesome! Can confirm that it continues processing traffic when hitting it with 
sslabs.
Will also move it to a more bussier machine to see how that handles.

I am seeing the following messages now:
Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa 
routines:CRYPTO_internal:block type is not 02
Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa 
routines:CRYPTO_internal:null before block missing
Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa 
routines:CRYPTO_internal:padding check failed
Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa 
routines:CRYPTO_internal:null before block missing

Mischa

> 
> Cheers
> -- 
> :wq Claudio
> 
> Index: ca.c
> ===
> RCS file: /cvs/src/usr.sbin/relayd/ca.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 ca.c
> --- ca.c  28 Nov 2017 00:20:23 -  1.31
> +++ ca.c  24 Dec 2017 18:01:20 -
> @@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep
>   break;
>   }
> 
> + if (cko.cko_tlen == -1) {
> + char buf[256];
> + log_warnx("%s: %s", __func__,
> + ERR_error_string(ERR_get_error(), buf));
> + }
> +
>   iov[c].iov_base = &cko;
>   iov[c++].iov_len = sizeof(cko);
> - if (cko.cko_tlen) {
> + if (cko.cko_tlen > 0) {
>   iov[c].iov_base = to;
>   iov[c++].iov_len = cko.cko_tlen;
>   }
> @@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f
> 
>   IMSG_SIZE_CHECK(&imsg, (&cko));
>   memcpy(&cko, imsg.data, sizeof(cko));
> - if (IMSG_DATA_SIZE(&imsg) !=
> - (sizeof(cko) + cko.cko_tlen))
> - fatalx("data size");
> 
>   ret = cko.cko_tlen;
> - if (ret) {
> + if (ret > 0) {
> + if (IMSG_DATA_SIZE(&imsg) !=
> + (sizeof(cko) + ret))
> + fatalx("data size");
>   toptr = (u_char *)imsg.data + sizeof(cko);
>   memcpy(to, toptr, ret);
>   }
> 



Re: relayd stops processing traffic intermittently

2017-12-24 Thread Claudio Jeker
On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
> 
> > On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
> > 
> >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
> >> Hi All,
> >> 
> >> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
> >> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic 
> >> and starts flooding the log file with the following message:
> >> 
> >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> [snip]
> >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> ...etc...
> >> 
> >> Restarting the daemon "fixes" the problem.
> >> Not sure how to trouble shoot this but I am able to reproduce this 
> >> consistently by pointing SSLLabs towards relayd.
> >> Would be great to get some pointers.
> >> 
> > 
> > I have seen this as well on our production systems. This is a problem in
> > the privsep part of the TLS code. I could not do more testing yet but my
> > assumption is that a new option / feature is freaking this code out.
> 
> Anything I can do or collect to give you more information? 

So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).

Cheers
-- 
:wq Claudio

Index: ca.c
===
RCS file: /cvs/src/usr.sbin/relayd/ca.c,v
retrieving revision 1.31
diff -u -p -r1.31 ca.c
--- ca.c28 Nov 2017 00:20:23 -  1.31
+++ ca.c24 Dec 2017 18:01:20 -
@@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep
break;
}
 
+   if (cko.cko_tlen == -1) {
+   char buf[256];
+   log_warnx("%s: %s", __func__,
+   ERR_error_string(ERR_get_error(), buf));
+   }
+
iov[c].iov_base = &cko;
iov[c++].iov_len = sizeof(cko);
-   if (cko.cko_tlen) {
+   if (cko.cko_tlen > 0) {
iov[c].iov_base = to;
iov[c++].iov_len = cko.cko_tlen;
}
@@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f
 
IMSG_SIZE_CHECK(&imsg, (&cko));
memcpy(&cko, imsg.data, sizeof(cko));
-   if (IMSG_DATA_SIZE(&imsg) !=
-   (sizeof(cko) + cko.cko_tlen))
-   fatalx("data size");
 
ret = cko.cko_tlen;
-   if (ret) {
+   if (ret > 0) {
+   if (IMSG_DATA_SIZE(&imsg) !=
+   (sizeof(cko) + ret))
+   fatalx("data size");
toptr = (u_char *)imsg.data + sizeof(cko);
memcpy(to, toptr, ret);
}



Re: relayd stops processing traffic intermittently

2017-12-23 Thread Claudio Jeker
On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
> 
> > On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
> > 
> >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
> >> Hi All,
> >> 
> >> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
> >> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic 
> >> and starts flooding the log file with the following message:
> >> 
> >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> [snip]
> >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> ...etc...
> >> 
> >> Restarting the daemon "fixes" the problem.
> >> Not sure how to trouble shoot this but I am able to reproduce this 
> >> consistently by pointing SSLLabs towards relayd.
> >> Would be great to get some pointers.
> >> 
> > 
> > I have seen this as well on our production systems. This is a problem in
> > the privsep part of the TLS code. I could not do more testing yet but my
> > assumption is that a new option / feature is freaking this code out.
> 
> Anything I can do or collect to give you more information? 

Your tip with SSLLabs is hopefully good enough to produce it at will.
I will try to fix this in the next days. Keep you posted.

-- 
:wq Claudio



Re: relayd stops processing traffic intermittently

2017-12-23 Thread Mischa Peters

> On 23 Dec 2017, at 13:08, Claudio Jeker  wrote:
> 
>> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
>> Hi All,
>> 
>> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
>> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic 
>> and starts flooding the log file with the following message:
>> 
>> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>> [snip]
>> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
>> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
>> ...etc...
>> 
>> Restarting the daemon "fixes" the problem.
>> Not sure how to trouble shoot this but I am able to reproduce this 
>> consistently by pointing SSLLabs towards relayd.
>> Would be great to get some pointers.
>> 
> 
> I have seen this as well on our production systems. This is a problem in
> the privsep part of the TLS code. I could not do more testing yet but my
> assumption is that a new option / feature is freaking this code out.

Anything I can do or collect to give you more information? 

Mischa


Re: relayd stops processing traffic intermittently

2017-12-23 Thread Claudio Jeker
On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
> Hi All,
> 
> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic and 
> starts flooding the log file with the following message:
> 
> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> [snip]
> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> ...etc...
> 
> Restarting the daemon "fixes" the problem.
> Not sure how to trouble shoot this but I am able to reproduce this 
> consistently by pointing SSLLabs towards relayd.
> Would be great to get some pointers.
> 

I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.

-- 
:wq Claudio


> Anonymised config below:
> # relayd.conf
> local_v4 = "xxx"
> local_v6 = "xxx"
> table  { 127.0.0.1 }
> 
> www1_addr_v4 = "xxx"
> www1_addr_v6 = "xxx"
> table  { xxx }
> 
> www3_addr_v4 = "xxx"
> www3_addr_v6 = "xxx"
> table  { xxx }
> 
> cust1_addr_v4 = "xxx"
> cust1_addr_v6 = "xxx"
> 
> cust2_addr_v4 = "xxx"
> cust3_addr_v4 = "xxx"
> cust4_addr_v4 = "xxx"
> table  { xxx }
> table  { xxx }
> table  { xxx }
> 
> cust5_addr_v4 = "xxx"
> table  { xxx }
> 
> http protocol httpfilter_default {
> match request header remove "Proxy"
> match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
> match request header append "X-Forwarded-By" value 
> "$SERVER_ADDR:$SERVER_PORT"
> match response header set "Server" value "Sever"
> match response header set "X-Powered-By" value "Power"
> tcp { no splice }
> }
> http protocol httpsfilter_default {
> match request header remove "Proxy"
> match request header set "X-ClientIP" value "$REMOTE_ADDR"
> match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
> match request header append "X-Forwarded-By" value 
> "$SERVER_ADDR:$SERVER_PORT"
> match response header set "Strict-Transport-Security" value 
> "max-age=31536000"
> match response header set "Server" value "Sever"
> match response header set "X-Powered-By" value "Power"
> match request quick header "Host" value "images.webcam.nl" forward to 
> 
> tcp { no splice }
> tls { no client-renegotiation }
> }
> 
> http protocol httpfilter {
> match request header remove "Proxy"
> match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
> match request header append "X-Forwarded-By" value 
> "$SERVER_ADDR:$SERVER_PORT"
> match response header set "Content-Security-Policy" value 
> "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; 
> style-src 'self' 'unsafe-inline'; img-src 'self'"
> match response header set "Server" value "Sever"
> match response header set "X-Powered-By" value "Power"
> match response header set "X-Frame-Options" value "SAMEORIGIN"
> match response header set "X-Xss-Protection" value "1; mode=block"
> match response header set "X-Content-Type-Options" value "nosniff"
> match request quick header "Host" value "*xxx*" forward to 
> match request quick header "Host" value "*xxx*" forward to 
> tcp { no splice }
> }
> http protocol httpsfilter {
> return error
> match request header remove "Proxy"
> match request header set "X-ClientIP" value "$REMOTE_ADDR"
> match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
> match request header append "X-Forwarded-By" value 
> "$SERVER_ADDR:$SERVER_PORT"
> match response header set "Strict-Transport-Security" value 
> "max-age=31536000"
> match response header set "Content-Security-Policy" value 
> "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; 
> style-src 'self' 'unsafe-inline'; img-src 'self'"
> match response header set "Server" value "Sever"
> match response header set "X-Powered-By" value "Power"
> match response header set "X-Frame-Options" value "SAMEORIGIN"
> match response header set "X-Xss-Protection" value "1; mode=block"
> match response header set "X-Content-Type-Options" value "nosniff"
> tcp { no splice }
> tls { no client-renegotiation }
> }
> relay default {
> listen on $local_v4 port 80
> listen on $local_v6 port 80
> protocol httpfilter_default
> forward

relayd stops processing traffic intermittently

2017-12-23 Thread Mischa
Hi All,

Since OpenBSD 6.2, just confirmed this in the latest snapshot (GENERIC.MP#305) 
as well, for some reason relayd stops processing traffic and starts flooding 
the log file with the following message:

Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...

Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently 
by pointing SSLLabs towards relayd.
Would be great to get some pointers.

Anonymised config below:
# relayd.conf
local_v4 = "xxx"
local_v6 = "xxx"
table  { 127.0.0.1 }

www1_addr_v4 = "xxx"
www1_addr_v6 = "xxx"
table  { xxx }

www3_addr_v4 = "xxx"
www3_addr_v6 = "xxx"
table  { xxx }

cust1_addr_v4 = "xxx"
cust1_addr_v6 = "xxx"

cust2_addr_v4 = "xxx"
cust3_addr_v4 = "xxx"
cust4_addr_v4 = "xxx"
table  { xxx }
table  { xxx }
table  { xxx }

cust5_addr_v4 = "xxx"
table  { xxx }

http protocol httpfilter_default {
match request header remove "Proxy"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value 
"$SERVER_ADDR:$SERVER_PORT"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
tcp { no splice }
}
http protocol httpsfilter_default {
match request header remove "Proxy"
match request header set "X-ClientIP" value "$REMOTE_ADDR"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value 
"$SERVER_ADDR:$SERVER_PORT"
match response header set "Strict-Transport-Security" value 
"max-age=31536000"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match request quick header "Host" value "images.webcam.nl" forward to 

tcp { no splice }
tls { no client-renegotiation }
}

http protocol httpfilter {
match request header remove "Proxy"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value 
"$SERVER_ADDR:$SERVER_PORT"
match response header set "Content-Security-Policy" value "default-src 
high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 
'unsafe-inline'; img-src 'self'"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match response header set "X-Frame-Options" value "SAMEORIGIN"
match response header set "X-Xss-Protection" value "1; mode=block"
match response header set "X-Content-Type-Options" value "nosniff"
match request quick header "Host" value "*xxx*" forward to 
match request quick header "Host" value "*xxx*" forward to 
tcp { no splice }
}
http protocol httpsfilter {
return error
match request header remove "Proxy"
match request header set "X-ClientIP" value "$REMOTE_ADDR"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value 
"$SERVER_ADDR:$SERVER_PORT"
match response header set "Strict-Transport-Security" value 
"max-age=31536000"
match response header set "Content-Security-Policy" value "default-src 
high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 
'unsafe-inline'; img-src 'self'"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match response header set "X-Frame-Options" value "SAMEORIGIN"
match response header set "X-Xss-Protection" value "1; mode=block"
match response header set "X-Content-Type-Options" value "nosniff"
tcp { no splice }
tls { no client-renegotiation }
}
relay default {
listen on $local_v4 port 80
listen on $local_v6 port 80
protocol httpfilter_default
forward to  port 8080
}
relay default_redirect {
listen on $cust2_addr_v4 port 80
listen on $cust3_addr_v4 port 80
listen on $cust4_addr_v4 port 80
listen on $cust5_addr_v4 port 80
listen on $cust1_addr_v4 port 80
listen on $cust1_addr_v6 port 80
protocol httpfilter_default
forward to  port 8081
}
relay default_redirect_tls {
listen on $cust4_addr_v4 port 443 tls
protocol httpsfilter_default
forward to  port 8081
}
relay www1 {
li