[jira] [Commented] (MESOS-5341) Enabled docker volume support for DockerContainerizer
[ https://issues.apache.org/jira/browse/MESOS-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275922#comment-15275922 ] Guangya Liu commented on MESOS-5341: There is a bug for current volume support in docker containerizer, the problem is that the current volume_driver is using {{optional string}}, but this is not right, we should use {{repeated string}} as one container can have multiple volume drivers. [~jieyu] do you think we need fix this or just keep current behavior as we will retire the {{volume_driver}} finally. {code} // The name of volume driver plugin. optional string volume_driver = 7; {code} > Enabled docker volume support for DockerContainerizer > - > > Key: MESOS-5341 > URL: https://issues.apache.org/jira/browse/MESOS-5341 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > When a user specifies Volume.Source, we need to prepare the `docker run` > command accordingly to support that. The {{DockerInfo.volume_driver}} can be > retired now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5340) SSL-downgrading support may prevent new connections
[ https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5340: -- Labels: mesosphere security ssl (was: ssl) > SSL-downgrading support may prevent new connections > --- > > Key: MESOS-5340 > URL: https://issues.apache.org/jira/browse/MESOS-5340 > Project: Mesos > Issue Type: Bug > Components: security >Affects Versions: 0.29.0, 0.28.1 >Reporter: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security, ssl > > When using an SSL-enabled build of Mesos in combination with SSL-downgrading > support, any connection that does not actually transmit data will hang the > runnable (e.g. master). > For reproducing the issue (on any platform)... > Spin up a master with enabled SSL-downgrading: > {noformat} > $ export SSL_ENABLED=true > $ export SSL_SUPPORT_DOWNGRADE=true > $ export SSL_KEY_FILE=/path/to/your/foo.key > $ export SSL_CERT_FILE=/path/to/your/foo.crt > $ export SSL_CA_FILE=/path/to/your/ca.crt > $ ./bin/mesos-master.sh --work_dir=/tmp/foo > {noformat} > Create some artificial HTTP request load for quickly spotting the problem in > both, the master logs as well as the output of CURL itself: > {noformat} > $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL > Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; > echo); done > {noformat} > Now create a connection to the master that does not transmit any data: > {noformat} > $ telnet localhost 5050 > {noformat} > You should now see the CURL requests hanging, the master stops responding to > new connections. This will persist until either some data is transmitted via > the above telnet connection or it is closed. > This problem has initially been observed when running Mesos on an AWS cluster > with enabled internal ELB health-checks for the master node. Those > health-checks are using long-lasting connections that do not transmit any > data and are closed after a configurable duration. In our test environment, > this duration was set to 60 seconds and hence we were seeing our master > getting repetitively unresponsive for 60 seconds, then getting "unstuck" for > a brief period until it got stuck again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5340) SSL-downgrading support may prevent new connections
[ https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5340: -- Component/s: security > SSL-downgrading support may prevent new connections > --- > > Key: MESOS-5340 > URL: https://issues.apache.org/jira/browse/MESOS-5340 > Project: Mesos > Issue Type: Bug > Components: security >Affects Versions: 0.29.0, 0.28.1 >Reporter: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security, ssl > > When using an SSL-enabled build of Mesos in combination with SSL-downgrading > support, any connection that does not actually transmit data will hang the > runnable (e.g. master). > For reproducing the issue (on any platform)... > Spin up a master with enabled SSL-downgrading: > {noformat} > $ export SSL_ENABLED=true > $ export SSL_SUPPORT_DOWNGRADE=true > $ export SSL_KEY_FILE=/path/to/your/foo.key > $ export SSL_CERT_FILE=/path/to/your/foo.crt > $ export SSL_CA_FILE=/path/to/your/ca.crt > $ ./bin/mesos-master.sh --work_dir=/tmp/foo > {noformat} > Create some artificial HTTP request load for quickly spotting the problem in > both, the master logs as well as the output of CURL itself: > {noformat} > $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL > Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; > echo); done > {noformat} > Now create a connection to the master that does not transmit any data: > {noformat} > $ telnet localhost 5050 > {noformat} > You should now see the CURL requests hanging, the master stops responding to > new connections. This will persist until either some data is transmitted via > the above telnet connection or it is closed. > This problem has initially been observed when running Mesos on an AWS cluster > with enabled internal ELB health-checks for the master node. Those > health-checks are using long-lasting connections that do not transmit any > data and are closed after a configurable duration. In our test environment, > this duration was set to 60 seconds and hence we were seeing our master > getting repetitively unresponsive for 60 seconds, then getting "unstuck" for > a brief period until it got stuck again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5340) SSL-downgrading support may prevent new connections
[ https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275898#comment-15275898 ] haosdent commented on MESOS-5340: - Compare to {{LibeventSSLSocketImpl::accept_queue}}, {{PollSocketImpl::accept}} didn't call the blocking function {{recv}} and {{recv}} on the socket would be called in {{process::internal::on_accept}}. I think we may consider to use similar way to avoid call a io block function during accept. > SSL-downgrading support may prevent new connections > --- > > Key: MESOS-5340 > URL: https://issues.apache.org/jira/browse/MESOS-5340 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.29.0, 0.28.1 >Reporter: Till Toenshoff >Priority: Blocker > Labels: ssl > > When using an SSL-enabled build of Mesos in combination with SSL-downgrading > support, any connection that does not actually transmit data will hang the > runnable (e.g. master). > For reproducing the issue (on any platform)... > Spin up a master with enabled SSL-downgrading: > {noformat} > $ export SSL_ENABLED=true > $ export SSL_SUPPORT_DOWNGRADE=true > $ export SSL_KEY_FILE=/path/to/your/foo.key > $ export SSL_CERT_FILE=/path/to/your/foo.crt > $ export SSL_CA_FILE=/path/to/your/ca.crt > $ ./bin/mesos-master.sh --work_dir=/tmp/foo > {noformat} > Create some artificial HTTP request load for quickly spotting the problem in > both, the master logs as well as the output of CURL itself: > {noformat} > $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL > Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; > echo); done > {noformat} > Now create a connection to the master that does not transmit any data: > {noformat} > $ telnet localhost 5050 > {noformat} > You should now see the CURL requests hanging, the master stops responding to > new connections. This will persist until either some data is transmitted via > the above telnet connection or it is closed. > This problem has initially been observed when running Mesos on an AWS cluster > with enabled internal ELB health-checks for the master node. Those > health-checks are using long-lasting connections that do not transmit any > data and are closed after a configurable duration. In our test environment, > this duration was set to 60 seconds and hence we were seeing our master > getting repetitively unresponsive for 60 seconds, then getting "unstuck" for > a brief period until it got stuck again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess
[ https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275892#comment-15275892 ] Chris commented on MESOS-5342: -- I've implemented code to support this particular feature and need to submit it for review. > CPU pinning/binding support for CgroupsCpushareIsolatorProcess > -- > > Key: MESOS-5342 > URL: https://issues.apache.org/jira/browse/MESOS-5342 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization >Affects Versions: 0.28.1 >Reporter: Chris > > The cgroups isolator currently lacks support for binding (also called > pinning) containers to a set of cores. The GNU/Linux kernel is known to make > sub-optimal core assignments for processes and threads. Poor assignments > impact program performance; particularly in the case of applications > requiring GPU resources. > Most cluster management systems from the HPC community (SLURM) provide both > cgroup isolation and cpu binding. This feature would provide similar > capabilities. The current interest in supporting Intel's Cache Allocation > Technology will require making choices about where container's are going to > run on the mesos-agent's processor(s) - this feature is a step toward > developing a robust solution. > The improvement in this JIRA ticket will handle hardware topology detection, > track container-to-core utilization in a histogram, and use a mathematical > optimization technique to select cores for container assignment based on > latency and the container-to-core utilization histogram. > For GPU tasks, the improvement will prioritize selection of cores based on > latency between the GPU and cores in an effort to minimize copy latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess
Chris created MESOS-5342: Summary: CPU pinning/binding support for CgroupsCpushareIsolatorProcess Key: MESOS-5342 URL: https://issues.apache.org/jira/browse/MESOS-5342 Project: Mesos Issue Type: Improvement Components: cgroups, containerization Affects Versions: 0.28.1 Reporter: Chris The cgroups isolator currently lacks support for binding (also called pinning) containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal core assignments for processes and threads. Poor assignments impact program performance; particularly in the case of applications requiring GPU resources. Most cluster management systems from the HPC community (SLURM) provide both cgroup isolation and cpu binding. This feature would provide similar capabilities. The current interest in supporting Intel's Cache Allocation Technology will require making choices about where container's are going to run on the mesos-agent's processor(s) - this feature is a step toward developing a robust solution. The improvement in this JIRA ticket will handle hardware topology detection, track container-to-core utilization in a histogram, and use a mathematical optimization technique to select cores for container assignment based on latency and the container-to-core utilization histogram. For GPU tasks, the improvement will prioritize selection of cores based on latency between the GPU and cores in an effort to minimize copy latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5340) SSL-downgrading support may prevent new connections
[ https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275888#comment-15275888 ] haosdent commented on MESOS-5340: - According to my test, I think this is not related to ssl downgrading. It could happen when we {{export SSL_SUPPORT_DOWNGRADE=false}} {code} # Console 1 $ telnet localhost 5050 {code} {code} $ curl https://www.haosdent.me:5050/master/slaves # stuck {code} It is because the handle logic of accept is serial in {{process.cpp}}. {code} void on_accept(const Future& socket) { LOG(INFO) << "Start accept socket"; if (socket.isReady()) { // Inform the socket manager for proper bookkeeping. socket_manager->accepted(socket.get()); const size_t size = 80 * 1024; char* data = new char[size]; DataDecoder* decoder = new DataDecoder(socket.get()); socket.get().recv(data, size) .onAny(lambda::bind( ::decode_recv, lambda::_1, data, size, new Socket(socket.get()), decoder)); } __s__->accept() .onAny(lambda::bind(_accept, lambda::_1)); } {code} {{process}} only continue to handle the next {{Future}} item from {{LibeventSSLSocketImpl::accept_queue}} after current one success or fail. > SSL-downgrading support may prevent new connections > --- > > Key: MESOS-5340 > URL: https://issues.apache.org/jira/browse/MESOS-5340 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.29.0, 0.28.1 >Reporter: Till Toenshoff >Priority: Blocker > Labels: ssl > > When using an SSL-enabled build of Mesos in combination with SSL-downgrading > support, any connection that does not actually transmit data will hang the > runnable (e.g. master). > For reproducing the issue (on any platform)... > Spin up a master with enabled SSL-downgrading: > {noformat} > $ export SSL_ENABLED=true > $ export SSL_SUPPORT_DOWNGRADE=true > $ export SSL_KEY_FILE=/path/to/your/foo.key > $ export SSL_CERT_FILE=/path/to/your/foo.crt > $ export SSL_CA_FILE=/path/to/your/ca.crt > $ ./bin/mesos-master.sh --work_dir=/tmp/foo > {noformat} > Create some artificial HTTP request load for quickly spotting the problem in > both, the master logs as well as the output of CURL itself: > {noformat} > $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL > Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; > echo); done > {noformat} > Now create a connection to the master that does not transmit any data: > {noformat} > $ telnet localhost 5050 > {noformat} > You should now see the CURL requests hanging, the master stops responding to > new connections. This will persist until either some data is transmitted via > the above telnet connection or it is closed. > This problem has initially been observed when running Mesos on an AWS cluster > with enabled internal ELB health-checks for the master node. Those > health-checks are using long-lasting connections that do not transmit any > data and are closed after a configurable duration. In our test environment, > this duration was set to 60 seconds and hence we were seeing our master > getting repetitively unresponsive for 60 seconds, then getting "unstuck" for > a brief period until it got stuck again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5341) Enabled docker volume support for DockerContainerizer
Guangya Liu created MESOS-5341: -- Summary: Enabled docker volume support for DockerContainerizer Key: MESOS-5341 URL: https://issues.apache.org/jira/browse/MESOS-5341 Project: Mesos Issue Type: Bug Reporter: Guangya Liu Assignee: Guangya Liu When a user specifies Volume.Source, we need to prepare the `docker run` command accordingly to support that. The {{DockerInfo.volume_driver}} can be retired now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3643) Implement stout/os/windows/shell.hpp
[ https://issues.apache.org/jira/browse/MESOS-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275744#comment-15275744 ] Joris Van Remoortere commented on MESOS-3643: - {code} commit fc4f9d25f75dc0ca87732c8b0ee868a5713f1d0f Author: Alex ClemmerDate: Sun May 8 17:00:05 2016 -0400 Windows: Fixed shell constants, marked `os::shell` as deleted. Review: https://reviews.apache.org/r/46393/ {code} > Implement stout/os/windows/shell.hpp > > > Key: MESOS-3643 > URL: https://issues.apache.org/jira/browse/MESOS-3643 > Project: Mesos > Issue Type: Task > Components: stout >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: mesosphere, windows, windows-mvp > Fix For: 0.28.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3656) Port process/socket.hpp to Windows
[ https://issues.apache.org/jira/browse/MESOS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275742#comment-15275742 ] Joris Van Remoortere commented on MESOS-3656: - {code} commit cd879244d42ade1f63d228694e5681ea254a9902 Author: Alex ClemmerDate: Sun May 8 13:32:09 2016 -0700 Windows: Libprocess: Winsock class to handle WSAStartup/WSACleanup. Review: https://reviews.apache.org/r/46344/ {code} > Port process/socket.hpp to Windows > -- > > Key: MESOS-3656 > URL: https://issues.apache.org/jira/browse/MESOS-3656 > Project: Mesos > Issue Type: Task > Components: libprocess >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: mesosphere, windows > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5278) Add a CLI allowing a user to enter a container.
[ https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275546#comment-15275546 ] Guangya Liu commented on MESOS-5278: [~idownes] Can you please share your internal version of this tool? I want to take it as a reference. Thanks. > Add a CLI allowing a user to enter a container. > --- > > Key: MESOS-5278 > URL: https://issues.apache.org/jira/browse/MESOS-5278 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu >Assignee: Guangya Liu > > Containers created by the unified containerizer (Mesos containerizer) uses > various namespaces (e.g., mount, network, etc.). > To improve debugability, we should create a CLI that allows an operator or a > user to enter the namespaces associated with the container, and execute an > arbitrary command in that container (similar to `docker exec`). -- This message was sent by Atlassian JIRA (v6.3.4#6332)