[jira] [Commented] (MESOS-1816) lxc execution driver support for docker containerizer
[ https://issues.apache.org/jira/browse/MESOS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193002#comment-14193002 ] Timothy Chen commented on MESOS-1816: - commit 3f7a9cd909a86e40e2484ab3e8ba0d22a49a39c6 Author: Timothy Chen tnac...@gmail.com Date: Mon Sep 22 15:04:00 2014 -0700 Support arbitrary parameters in docker containerizer lxc execution driver support for docker containerizer - Key: MESOS-1816 URL: https://issues.apache.org/jira/browse/MESOS-1816 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Eugen Feller Assignee: Timothy Chen Labels: docker Attachments: docker_patch.cpp, test_framework_patch.cpp Hi all, One way to get networking up and running in Docker is to use the bridge mode. The bridge mode results in Docker automatically assigning IPs to the containers from the IP range specified on the docker0 bridge. In our setup we need to manage IPs using our own DHCP server. Unfortunately this is not supported by Docker's libcontainer execution driver. Instead, the lxc execution driver (http://blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/) can be used. In order to use the lxc execution driver, Docker daemon needs to be started with the -e lxc flag. Once started, Docker own networking can be disabled and lxc options can be passed to the docker run command. For example: $ docker run -n=false --lxc-conf=lxc.network.type = veth --lxc-conf=lxc.network.link = br0 --lxc-conf=lxc.network.name = eth0 -lxc-conf=lxc.network.flags = up ... This will force Docker to use my own bridge br0. Moreover, IP can be assigned to the eth0 interface by executing the dhclient eth0 command inside the started container. In the previous integration of Docker in Mesos (using Deimos), I have passed the aforementioned options using the options flag in Marathon. However, with the new changes this is no longer possible. It would be great to support the lxc execution driver in the current Docker integration. Thanks. Best regards, Eugen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1816) lxc execution driver support for docker containerizer
[ https://issues.apache.org/jira/browse/MESOS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed MESOS-1816. --- lxc execution driver support for docker containerizer - Key: MESOS-1816 URL: https://issues.apache.org/jira/browse/MESOS-1816 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Eugen Feller Assignee: Timothy Chen Labels: docker Attachments: docker_patch.cpp, test_framework_patch.cpp Hi all, One way to get networking up and running in Docker is to use the bridge mode. The bridge mode results in Docker automatically assigning IPs to the containers from the IP range specified on the docker0 bridge. In our setup we need to manage IPs using our own DHCP server. Unfortunately this is not supported by Docker's libcontainer execution driver. Instead, the lxc execution driver (http://blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/) can be used. In order to use the lxc execution driver, Docker daemon needs to be started with the -e lxc flag. Once started, Docker own networking can be disabled and lxc options can be passed to the docker run command. For example: $ docker run -n=false --lxc-conf=lxc.network.type = veth --lxc-conf=lxc.network.link = br0 --lxc-conf=lxc.network.name = eth0 -lxc-conf=lxc.network.flags = up ... This will force Docker to use my own bridge br0. Moreover, IP can be assigned to the eth0 interface by executing the dhclient eth0 command inside the started container. In the previous integration of Docker in Mesos (using Deimos), I have passed the aforementioned options using the options flag in Marathon. However, with the new changes this is no longer possible. It would be great to support the lxc execution driver in the current Docker integration. Thanks. Best regards, Eugen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1833) Running docker container with colon in executor id generates error
[ https://issues.apache.org/jira/browse/MESOS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed MESOS-1833. --- Running docker container with colon in executor id generates error -- Key: MESOS-1833 URL: https://issues.apache.org/jira/browse/MESOS-1833 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 0.20.0 Environment: ubuntu (mesosphere vangrant vm) Reporter: Elizabeth Lingg Assignee: Timothy Chen Labels: docker I created and launched a container successfully in chronos, but when mesos ran the docker container, docker did not accept the volumes setting due to the colon in the executor id (-v option). Here is the executor id, which is valid: ct:141167016:0:lldocker. In mesos, there will be a fix to avoid using the host directory by use of a simlink and mapping. However, ideally docker will fix this issue. They should accept executor ids with colons as the format is valid. Here is the error log: Error: One iContainer '8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3' for executor 'ct:141167016:0:lldocker' of framework '20140925-174859-16842879-5050-1573-' failed to start: Failed to 'docker run -d -c 512 -m 536870912 -e mesos_task_id=ct:141167016:0:lldocker -e CHRONOS_JOB_OWNER= -e MESOS_SANDBOX=/mnt/mesos/sandbox -v /tmp/mesos/slaves/20140925-181954-16842879-5050-1560-0/frameworks/20140925-174859-16842879-5050-1573-/executors/ct:141167016:0:lldocker/runs/8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3 libmesos/ubuntu -c while sleep 10; do date =u %T; done': exit status = exited with status 2 stderr = invalid value /tmp/mesos/slaves/20140925-181954-16842879-5050-1560-0/frameworks/20140925-174859-16842879-5050-1573-/executors/ct:141167016:0:lldocker/runs/8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3:/mnt/mesos/sandbox for flag -v: bad format for volumes: /tmp/mesos/slaves/20140925-181954-16842879-5050-1560-0/frameworks/20140925-174859-16842879-5050-1573-/executors/ct:141167016:0:lldocker/runs/8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3:/mnt/mesos/sandbox -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1833) Running docker container with colon in executor id generates error
[ https://issues.apache.org/jira/browse/MESOS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193005#comment-14193005 ] Timothy Chen commented on MESOS-1833: - commit 3633142e4de6e2641296070934ab2a18e22cb2be Author: Timothy Chen tnac...@apache.org Date: Tue Oct 7 21:36:32 2014 -0700 Symlink sandbox directories in docker containerizer Review: https://reviews.apache.org/r/26517 Running docker container with colon in executor id generates error -- Key: MESOS-1833 URL: https://issues.apache.org/jira/browse/MESOS-1833 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 0.20.0 Environment: ubuntu (mesosphere vangrant vm) Reporter: Elizabeth Lingg Assignee: Timothy Chen Labels: docker I created and launched a container successfully in chronos, but when mesos ran the docker container, docker did not accept the volumes setting due to the colon in the executor id (-v option). Here is the executor id, which is valid: ct:141167016:0:lldocker. In mesos, there will be a fix to avoid using the host directory by use of a simlink and mapping. However, ideally docker will fix this issue. They should accept executor ids with colons as the format is valid. Here is the error log: Error: One iContainer '8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3' for executor 'ct:141167016:0:lldocker' of framework '20140925-174859-16842879-5050-1573-' failed to start: Failed to 'docker run -d -c 512 -m 536870912 -e mesos_task_id=ct:141167016:0:lldocker -e CHRONOS_JOB_OWNER= -e MESOS_SANDBOX=/mnt/mesos/sandbox -v /tmp/mesos/slaves/20140925-181954-16842879-5050-1560-0/frameworks/20140925-174859-16842879-5050-1573-/executors/ct:141167016:0:lldocker/runs/8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3 libmesos/ubuntu -c while sleep 10; do date =u %T; done': exit status = exited with status 2 stderr = invalid value /tmp/mesos/slaves/20140925-181954-16842879-5050-1560-0/frameworks/20140925-174859-16842879-5050-1573-/executors/ct:141167016:0:lldocker/runs/8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3:/mnt/mesos/sandbox for flag -v: bad format for volumes: /tmp/mesos/slaves/20140925-181954-16842879-5050-1560-0/frameworks/20140925-174859-16842879-5050-1573-/executors/ct:141167016:0:lldocker/runs/8fdb0cd7-86f8-4bc9-bd1b-d36f86663bb3:/mnt/mesos/sandbox -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1656) Do not remove docker container until gc process runs
[ https://issues.apache.org/jira/browse/MESOS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed MESOS-1656. --- commit 20b0225fcf1ed84c0f518ae51b856f43c044782f Author: Timothy Chen tnac...@apache.org Date: Fri Oct 31 16:41:07 2014 -0700 Schedule docker containers for removal. Instead of removing docker containers right after reap, schedule it to be removed later. Review: https://reviews.apache.org/r/26861 Do not remove docker container until gc process runs Key: MESOS-1656 URL: https://issues.apache.org/jira/browse/MESOS-1656 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Jay Buffington Assignee: Timothy Chen The current docker containerizer implementation that is up for review at https://reviews.apache.org/r/23771/ does a {{docker rm}} as soon as a task fails. This makes debugging difficult, if not impossible. MESOS-1652 will aid with this, but it does not address the use case of diagnosing a failure which involves inspecting the state of a container. [~tnachen] and I discussed this on IRC, see http://wilderness.apache.org/channels/?f=mesos/2014-07-31#1406842111 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1884) Composing Containerizer is not sending calls to still launching containers
[ https://issues.apache.org/jira/browse/MESOS-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed MESOS-1884. --- commit 73161e54caaab9506951cfc1c91fdc0cc27663e2 Author: Timothy Chen tnac...@apache.org Date: Fri Oct 31 15:38:39 2014 -0700 Fixed containerizer not receiving calls when launching. Review: https://reviews.apache.org/r/26486 Composing Containerizer is not sending calls to still launching containers -- Key: MESOS-1884 URL: https://issues.apache.org/jira/browse/MESOS-1884 Project: Mesos Issue Type: Bug Components: containerization Reporter: Timothy Chen Assignee: Timothy Chen Currently the new composing containerizer it holds multiple containerizers and passes calls to the underlying containerizer that launched the container. However, this introduces a new problem where the composing containerizer only forward update/destroy calls to the container once the container is finished launching, as it then internally updates the tracking structure which containerizer is a container launched with. The symptom is that the containerizer that is launching a container won't get a destroy call and continue to launch the container, while the slave already removed the task and it became a orphaned container that is not tracked by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1948) Docker tests are flaky
[ https://issues.apache.org/jira/browse/MESOS-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed MESOS-1948. --- commit 2fbb2fb4d8e19aa4eb055b14a11268b6bf9ce4c4 Author: Timothy Chen tnac...@apache.org Date: Fri Oct 31 16:42:03 2014 -0700 Fixed docker flaky tests. Docker tests are flaky, mostly around getting expected output from the docker container forwarded to stdout/stderr. This is due to Docker not always have the stdout/stderr output available for docker logs if kill/rm is called. Review: https://reviews.apache.org/r/26862 Docker tests are flaky -- Key: MESOS-1948 URL: https://issues.apache.org/jira/browse/MESOS-1948 Project: Mesos Issue Type: Bug Reporter: Timothy Chen Assignee: Timothy Chen Labels: docker The docker unit tests may fail occasionally because of docker issues and some testing orders. More details can be found in the reviewboard -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1849) Cannot execute container in privileged mode
[ https://issues.apache.org/jira/browse/MESOS-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed MESOS-1849. --- commit 4cec1140351de976ec443e9b36a55be3c20bd10f Author: Timothy Chen tnac...@apache.org Date: Fri Oct 10 09:20:30 2014 -0700 Add priviledged option to docker info Cannot execute container in privileged mode Key: MESOS-1849 URL: https://issues.apache.org/jira/browse/MESOS-1849 Project: Mesos Issue Type: Bug Affects Versions: 0.20.1 Environment: Mesos 0.20.1 Marathon 0.7.1 Reporter: Adam Spektor Assignee: Timothy Chen Priority: Blocker Labels: docker Cannot find a way to run container in privileged mode, it block me to continue with Mesos, Marathon POC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1998) Subversion include path hardcoding
[ https://issues.apache.org/jira/browse/MESOS-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193265#comment-14193265 ] Till Toenshoff commented on MESOS-1998: --- Adding results of a compilation attempt to make it easier for users to locate a solution. {noformat} In file included from ../../src/state/log.cpp:25:0: ../../3rdparty/libprocess/3rdparty/stout/include/stout/svn.hpp:21:23: fatal error: svn_delta.h: No such file or directory #include svn_delta.h ^ compilation terminated. make[2]: *** [state/libstate_la-log.lo] Error 1 make[2]: *** Waiting for unfinished jobs mv -f log/.deps/liblog_la-recover.Tpo log/.deps/liblog_la-recover.Plo mv -f state/.deps/libstate_la-in_memory.Tpo state/.deps/libstate_la-in_memory.Plo libtool: compile: g++-4.9 -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.21.0\ -DPACKAGE_STRING=\mesos 0.21.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.21.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -DGTEST_USE_OWN_TR1_TUPLE=1 -MT state/libstate_la-leveldb.lo -MD -MP -MF state/.deps/libstate_la-leveldb.Tpo -c ../../src/state/leveldb.cpp -o state/libstate_la-leveldb.o /dev/null 21 mv -f state/.deps/libstate_la-leveldb.Tpo state/.deps/libstate_la-leveldb.Plo make[1]: *** [all] Error 2 make: *** [all-recursive] Error 1 {noformat} Subversion include path hardcoding -- Key: MESOS-1998 URL: https://issues.apache.org/jira/browse/MESOS-1998 Project: Mesos Issue Type: Bug Reporter: Till Toenshoff Currently, we are using a hardcoded location variant for subversion related headers. The default path is set to {{/usr/include/subversion-1}}. This will fail on any OSX homebrew installed version of subversion and may also fail for other systems. I would like to suggest changing the current hardcoded path into a basepath and to reference the actual headers (e.g. svn_delta.h) via subversion-1/svn_delta.h. This would allow homebrew users (and others) to build mesos out of the box and without A. manually supplying the subversion-1 location B. linking /usr/local/include/subversion-1 towards /usr/include/subversion-1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1711) Create method for users to identify HDFS compatible protocols in fetcher.cpp
[ https://issues.apache.org/jira/browse/MESOS-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-1711: -- Assignee: (was: Timothy St. Clair) Create method for users to identify HDFS compatible protocols in fetcher.cpp Key: MESOS-1711 URL: https://issues.apache.org/jira/browse/MESOS-1711 Project: Mesos Issue Type: Improvement Components: general Affects Versions: 0.19.1 Environment: All Reporter: John Omernik Priority: Minor Labels: fetcher, hadoop, hdfs Original Estimate: 6h Remaining Estimate: 6h In fetcher.cpp, the code to get the Mesos packages uses a hard coded list of protocols to determine if the Hadoop copytoLocal method is used or if another method (such as standard filecopy). This limits the addition of new protocols that are HDFS compatible until the next release of Mesos. Tachyon Filesystem (tachyonfs://), MapR FS (maprfs://) and glusterfs:// are three examples that could make use of this. Instead of just adding those file systems in the hard coded list, I recommend following the lead of the Tachyon Project. In tachyon-0.6.0-SNAPSHOT, they have added an environment variable of allowed hdfs compatible protocols. This comma-separated list allows the user/admin to specify which protocols are HDFS compatible, without hard coding it in the fetcher.cpp. I don't have access to the Tachyon issues list for linking, but the code is on line 75 of https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/UnderFileSystem.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1711) Create method for users to identify HDFS compatible protocols in fetcher.cpp
[ https://issues.apache.org/jira/browse/MESOS-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-1711: -- Assignee: Timothy St. Clair Create method for users to identify HDFS compatible protocols in fetcher.cpp Key: MESOS-1711 URL: https://issues.apache.org/jira/browse/MESOS-1711 Project: Mesos Issue Type: Improvement Components: general Affects Versions: 0.19.1 Environment: All Reporter: John Omernik Assignee: Timothy St. Clair Priority: Minor Labels: fetcher, hadoop, hdfs Original Estimate: 6h Remaining Estimate: 6h In fetcher.cpp, the code to get the Mesos packages uses a hard coded list of protocols to determine if the Hadoop copytoLocal method is used or if another method (such as standard filecopy). This limits the addition of new protocols that are HDFS compatible until the next release of Mesos. Tachyon Filesystem (tachyonfs://), MapR FS (maprfs://) and glusterfs:// are three examples that could make use of this. Instead of just adding those file systems in the hard coded list, I recommend following the lead of the Tachyon Project. In tachyon-0.6.0-SNAPSHOT, they have added an environment variable of allowed hdfs compatible protocols. This comma-separated list allows the user/admin to specify which protocols are HDFS compatible, without hard coding it in the fetcher.cpp. I don't have access to the Tachyon issues list for linking, but the code is on line 75 of https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/UnderFileSystem.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1711) Create method for users to identify HDFS compatible protocols in fetcher.cpp
[ https://issues.apache.org/jira/browse/MESOS-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193724#comment-14193724 ] Adam B commented on MESOS-1711: --- Ankur's proposed fix: https://reviews.apache.org/r/27483 [~tstclair] may want to take a look too. Create method for users to identify HDFS compatible protocols in fetcher.cpp Key: MESOS-1711 URL: https://issues.apache.org/jira/browse/MESOS-1711 Project: Mesos Issue Type: Improvement Components: general Affects Versions: 0.19.1 Environment: All Reporter: John Omernik Assignee: Timothy St. Clair Priority: Minor Labels: fetcher, hadoop, hdfs Original Estimate: 6h Remaining Estimate: 6h In fetcher.cpp, the code to get the Mesos packages uses a hard coded list of protocols to determine if the Hadoop copytoLocal method is used or if another method (such as standard filecopy). This limits the addition of new protocols that are HDFS compatible until the next release of Mesos. Tachyon Filesystem (tachyonfs://), MapR FS (maprfs://) and glusterfs:// are three examples that could make use of this. Instead of just adding those file systems in the hard coded list, I recommend following the lead of the Tachyon Project. In tachyon-0.6.0-SNAPSHOT, they have added an environment variable of allowed hdfs compatible protocols. This comma-separated list allows the user/admin to specify which protocols are HDFS compatible, without hard coding it in the fetcher.cpp. I don't have access to the Tachyon issues list for linking, but the code is on line 75 of https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/UnderFileSystem.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)