[jira] [Commented] (MESOS-5341) Enabled docker volume support for DockerContainerizer

2016-05-08 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275922#comment-15275922
 ] 

Guangya Liu commented on MESOS-5341:


There is a bug for current volume support in docker containerizer, the problem 
is that the current volume_driver is using {{optional string}}, but this is not 
right, we should use {{repeated string}} as one container can have multiple 
volume drivers. [~jieyu] do you think we need fix this or just keep current 
behavior as we will retire the {{volume_driver}} finally.

{code}
// The name of volume driver plugin.
optional string volume_driver = 7;
{code}

> Enabled docker volume support for DockerContainerizer
> -
>
> Key: MESOS-5341
> URL: https://issues.apache.org/jira/browse/MESOS-5341
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> When a user specifies Volume.Source, we need to prepare the `docker run` 
> command accordingly to support that. The {{DockerInfo.volume_driver}} can be 
> retired now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5340) SSL-downgrading support may prevent new connections

2016-05-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5340:
--
Labels: mesosphere security ssl  (was: ssl)

> SSL-downgrading support may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled internal ELB health-checks for the master node. Those 
> health-checks are using long-lasting connections that do not transmit any 
> data and are closed after a configurable duration. In our test environment, 
> this duration was set to 60 seconds and hence we were seeing our master 
> getting repetitively unresponsive for 60 seconds, then getting "unstuck" for 
> a brief period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5340) SSL-downgrading support may prevent new connections

2016-05-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5340:
--
Component/s: security

> SSL-downgrading support may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled internal ELB health-checks for the master node. Those 
> health-checks are using long-lasting connections that do not transmit any 
> data and are closed after a configurable duration. In our test environment, 
> this duration was set to 60 seconds and hence we were seeing our master 
> getting repetitively unresponsive for 60 seconds, then getting "unstuck" for 
> a brief period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5340) SSL-downgrading support may prevent new connections

2016-05-08 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275898#comment-15275898
 ] 

haosdent commented on MESOS-5340:
-

Compare to {{LibeventSSLSocketImpl::accept_queue}}, {{PollSocketImpl::accept}} 
didn't call the blocking function {{recv}} and {{recv}} on the socket would be 
called in 
{{process::internal::on_accept}}. I think we may consider to use similar way to 
avoid call a io block function during accept.

> SSL-downgrading support may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled internal ELB health-checks for the master node. Those 
> health-checks are using long-lasting connections that do not transmit any 
> data and are closed after a configurable duration. In our test environment, 
> this duration was set to 60 seconds and hence we were seeing our master 
> getting repetitively unresponsive for 60 seconds, then getting "unstuck" for 
> a brief period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-08 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275892#comment-15275892
 ] 

Chris commented on MESOS-5342:
--

I've implemented code to support this particular feature and need to submit it 
for review.

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance; particularly in the case of applications 
> requiring GPU resources. 
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2016-05-08 Thread Chris (JIRA)
Chris created MESOS-5342:


 Summary: CPU pinning/binding support for 
CgroupsCpushareIsolatorProcess
 Key: MESOS-5342
 URL: https://issues.apache.org/jira/browse/MESOS-5342
 Project: Mesos
  Issue Type: Improvement
  Components: cgroups, containerization
Affects Versions: 0.28.1
Reporter: Chris


The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance; particularly in the case of applications requiring GPU resources. 

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5340) SSL-downgrading support may prevent new connections

2016-05-08 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275888#comment-15275888
 ] 

haosdent commented on MESOS-5340:
-

According to my test, I think this is not related to ssl downgrading. It could 
happen when we {{export SSL_SUPPORT_DOWNGRADE=false}}

{code}
# Console 1
$ telnet localhost 5050
{code}

{code}
$ curl https://www.haosdent.me:5050/master/slaves
# stuck
{code}

It is because the handle logic of accept is serial in {{process.cpp}}.
{code}
void on_accept(const Future& socket)
{
  LOG(INFO) << "Start accept socket";
  if (socket.isReady()) {
// Inform the socket manager for proper bookkeeping.
socket_manager->accepted(socket.get());

const size_t size = 80 * 1024;
char* data = new char[size];

DataDecoder* decoder = new DataDecoder(socket.get());

socket.get().recv(data, size)
  .onAny(lambda::bind(
  ::decode_recv,
  lambda::_1,
  data,
  size,
  new Socket(socket.get()),
  decoder));
  }

  __s__->accept()
.onAny(lambda::bind(_accept, lambda::_1));
}
{code}
{{process}} only continue to handle the next {{Future}} item from 
{{LibeventSSLSocketImpl::accept_queue}} after current one success or fail.

> SSL-downgrading support may prevent new connections
> ---
>
> Key: MESOS-5340
> URL: https://issues.apache.org/jira/browse/MESOS-5340
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.29.0, 0.28.1
>Reporter: Till Toenshoff
>Priority: Blocker
>  Labels: ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled internal ELB health-checks for the master node. Those 
> health-checks are using long-lasting connections that do not transmit any 
> data and are closed after a configurable duration. In our test environment, 
> this duration was set to 60 seconds and hence we were seeing our master 
> getting repetitively unresponsive for 60 seconds, then getting "unstuck" for 
> a brief period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5341) Enabled docker volume support for DockerContainerizer

2016-05-08 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-5341:
--

 Summary: Enabled docker volume support for DockerContainerizer
 Key: MESOS-5341
 URL: https://issues.apache.org/jira/browse/MESOS-5341
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


When a user specifies Volume.Source, we need to prepare the `docker run` 
command accordingly to support that. The {{DockerInfo.volume_driver}} can be 
retired now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3643) Implement stout/os/windows/shell.hpp

2016-05-08 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275744#comment-15275744
 ] 

Joris Van Remoortere commented on MESOS-3643:
-

{code}
commit fc4f9d25f75dc0ca87732c8b0ee868a5713f1d0f
Author: Alex Clemmer 
Date:   Sun May 8 17:00:05 2016 -0400

Windows: Fixed shell constants, marked `os::shell` as deleted.

Review: https://reviews.apache.org/r/46393/
{code}

> Implement stout/os/windows/shell.hpp
> 
>
> Key: MESOS-3643
> URL: https://issues.apache.org/jira/browse/MESOS-3643
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3656) Port process/socket.hpp to Windows

2016-05-08 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275742#comment-15275742
 ] 

Joris Van Remoortere commented on MESOS-3656:
-

{code}
commit cd879244d42ade1f63d228694e5681ea254a9902
Author: Alex Clemmer 
Date:   Sun May 8 13:32:09 2016 -0700

Windows: Libprocess: Winsock class to handle WSAStartup/WSACleanup.

Review: https://reviews.apache.org/r/46344/
{code}

> Port process/socket.hpp to Windows
> --
>
> Key: MESOS-3656
> URL: https://issues.apache.org/jira/browse/MESOS-3656
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-05-08 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275546#comment-15275546
 ] 

Guangya Liu commented on MESOS-5278:


[~idownes] Can you please share your internal version of this tool? I want to 
take it as a reference. Thanks.

> Add a CLI allowing a user to enter a container.
> ---
>
> Key: MESOS-5278
> URL: https://issues.apache.org/jira/browse/MESOS-5278
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Guangya Liu
>
> Containers created by the unified containerizer (Mesos containerizer) uses 
> various namespaces (e.g., mount, network, etc.).
> To improve debugability, we should create a CLI that allows an operator or a 
> user to enter the namespaces associated with the container, and execute an 
> arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)