[jira] [Updated] (MESOS-5340) libevent builds may prevent new connections

2016-05-10 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-5340:
---
Shepherd: Joris Van Remoortere
Assignee: Benjamin Mahler

[~jvanremoortere] I took a look and have a proposed a fix here: 
https://reviews.apache.org/r/47192/

> libevent builds may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled load-balancer (which uses an idle, persistent connection) for 
> the master node. Such connection does naturally not transmit any data as long 
> as there are no external requests routed via the load-balancer. AWS allows 
> setting up a timeout for those connections and in our test environment, this 
> duration was set to 60 seconds and hence we were seeing our master getting 
> repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief 
> period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5340) libevent builds may prevent new connections

2016-05-09 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5340:
--
Description: 
When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
support, any connection that does not actually transmit data will hang the 
runnable (e.g. master).

For reproducing the issue (on any platform)...

Spin up a master with enabled SSL-downgrading:
{noformat}
$ export SSL_ENABLED=true
$ export SSL_SUPPORT_DOWNGRADE=true
$ export SSL_KEY_FILE=/path/to/your/foo.key
$ export SSL_CERT_FILE=/path/to/your/foo.crt
$ export SSL_CA_FILE=/path/to/your/ca.crt
$ ./bin/mesos-master.sh --work_dir=/tmp/foo
{noformat}

Create some artificial HTTP request load for quickly spotting the problem in 
both, the master logs as well as the output of CURL itself:
{noformat}
$ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; echo); 
done
{noformat}

Now create a connection to the master that does not transmit any data:
{noformat}
$ telnet localhost 5050
{noformat}

You should now see the CURL requests hanging, the master stops responding to 
new connections. This will persist until either some data is transmitted via 
the above telnet connection or it is closed.

This problem has initially been observed when running Mesos on an AWS cluster 
with enabled load-balancer (which uses an idle, persistent connection) for the 
master node. Such connection does naturally not transmit any data as long as 
there are no external requests routed via the load-balancer. AWS allows setting 
up a timeout for those connections and in our test environment, this duration 
was set to 60 seconds and hence we were seeing our master getting repetitively 
unresponsive for 60 seconds, then getting "unstuck" for a brief period until it 
got stuck again.


  was:
When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
support, any connection that does not actually transmit data will hang the 
runnable (e.g. master).

For reproducing the issue (on any platform)...

Spin up a master with enabled SSL-downgrading:
{noformat}
$ export SSL_ENABLED=true
$ export SSL_SUPPORT_DOWNGRADE=true
$ export SSL_KEY_FILE=/path/to/your/foo.key
$ export SSL_CERT_FILE=/path/to/your/foo.crt
$ export SSL_CA_FILE=/path/to/your/ca.crt
$ ./bin/mesos-master.sh --work_dir=/tmp/foo
{noformat}

Create some artificial HTTP request load for quickly spotting the problem in 
both, the master logs as well as the output of CURL itself:
{noformat}
$ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; echo); 
done
{noformat}

Now create a connection to the master that does not transmit any data:
{noformat}
$ telnet localhost 5050
{noformat}

You should now see the CURL requests hanging, the master stops responding to 
new connections. This will persist until either some data is transmitted via 
the above telnet connection or it is closed.

This problem has initially been observed when running Mesos on an AWS cluster 
with enabled internal ELB health-checks for the master node. Those 
health-checks are using long-lasting connections that do not transmit any data 
and are closed after a configurable duration. In our test environment, this 
duration was set to 60 seconds and hence we were seeing our master getting 
repetitively unresponsive for 60 seconds, then getting "unstuck" for a brief 
period until it got stuck again.



> libevent builds may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> 

[jira] [Updated] (MESOS-5340) libevent builds may prevent new connections

2016-05-09 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5340:
--
Summary: libevent builds may prevent new connections  (was: SSL-downgrading 
support may prevent new connections)

> libevent builds may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled internal ELB health-checks for the master node. Those 
> health-checks are using long-lasting connections that do not transmit any 
> data and are closed after a configurable duration. In our test environment, 
> this duration was set to 60 seconds and hence we were seeing our master 
> getting repetitively unresponsive for 60 seconds, then getting "unstuck" for 
> a brief period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)