[jira] [Commented] (MESOS-6002) The whiteout file cannot be removed correctly using aufs backend.
[ https://issues.apache.org/jira/browse/MESOS-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483672#comment-15483672 ] Philip Winder commented on MESOS-6002: -- This has just happened to me too. I tested it a few weeks ago and it worked fine. Now i receive the following error: {code} E0912 09:39:08.976014 5703 slave.cpp:3976] Container 'c4ac91be-b70b-4b6e-a300-8a4bc87121d2' for executor 'catalogue-db' of framework c31f17be-2be5-4c10-8f04-6e3271836c39-0074 failed to start: Failed to remove whiteout file '/var/lib/mesos/provisioner/containers/c4ac91be-b70b-4b6e-a300-8a4bc87121d2/backends/copy/rootfses/4813e7ed-96cd-443c-8faf-9505954e9a77/var/lib/apt/lists/partial/.wh..opq': No such file or directory {code} Which makes me think that an update has caused this. {code} ubuntu@ip-10-0-0-230:~$ docker version Client: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built:Thu Aug 18 05:22:43 2016 OS/Arch: linux/amd64 Server: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built:Thu Aug 18 05:22:43 2016 OS/Arch: linux/amd64 {code} {code} ubuntu@ip-10-0-0-230:~$ mesos master WARNING: Logging before InitGoogleLogging() is written to STDERR I0912 09:48:14.723008 22296 main.cpp:263] Build: 2016-07-27 20:23:20 by ubuntu I0912 09:48:14.723098 22296 main.cpp:264] Version: 1.0.0 I0912 09:48:14.723107 22296 main.cpp:267] Git tag: 1.0.0 I0912 09:48:14.723114 22296 main.cpp:271] Git SHA: c9b70582e9fccab8f6863b0bd3a812b5969a8c24 {code} {code} ubuntu@ip-10-0-0-230:~$ uname -a Linux ip-10-0-0-230 3.13.0-91-generic #138-Ubuntu SMP Fri Jun 24 17:00:34 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux ubuntu@ip-10-0-0-230:~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS" {code} > The whiteout file cannot be removed correctly using aufs backend. > - > > Key: MESOS-6002 > URL: https://issues.apache.org/jira/browse/MESOS-6002 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 14, Ubuntu 12 > Or any os with aufs module >Reporter: Gilbert Song > Labels: aufs, backend, containerizer > > The whiteout file is not removed correctly when using the aufs backend in > unified containerizer. It can be verified by this unit test with the aufs > manually specified. > {noformat} > [20:11:24] : [Step 10/10] [ RUN ] > ProvisionerDockerPullerTest.ROOT_INTERNET_CURL_Whiteout > [20:11:24]W: [Step 10/10] I0805 20:11:24.986734 24295 cluster.cpp:155] > Creating default 'local' authorizer > [20:11:25]W: [Step 10/10] I0805 20:11:25.001153 24295 leveldb.cpp:174] > Opened db in 14.308627ms > [20:11:25]W: [Step 10/10] I0805 20:11:25.003731 24295 leveldb.cpp:181] > Compacted db in 2.558329ms > [20:11:25]W: [Step 10/10] I0805 20:11:25.003749 24295 leveldb.cpp:196] > Created db iterator in 3086ns > [20:11:25]W: [Step 10/10] I0805 20:11:25.003754 24295 leveldb.cpp:202] > Seeked to beginning of db in 595ns > [20:11:25]W: [Step 10/10] I0805 20:11:25.003758 24295 leveldb.cpp:271] > Iterated through 0 keys in the db in 314ns > [20:11:25]W: [Step 10/10] I0805 20:11:25.003769 24295 replica.cpp:776] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [20:11:25]W: [Step 10/10] I0805 20:11:25.004086 24315 recover.cpp:451] > Starting replica recovery > [20:11:25]W: [Step 10/10] I0805 20:11:25.004251 24312 recover.cpp:477] > Replica is in EMPTY status > [20:11:25]W: [Step 10/10] I0805 20:11:25.004546 24314 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > __req_res__(5640)@172.30.2.105:36006 > [20:11:25]W: [Step 10/10] I0805 20:11:25.004607 24312 recover.cpp:197] > Received a recover response from a replica in EMPTY status > [20:11:25]W: [Step 10/10] I0805 20:11:25.004762 24313 recover.cpp:568] > Updating replica status to STARTING > [20:11:25]W: [Step 10/10] I0805 20:11:25.004776 24314 master.cpp:375] > Master 21665992-d47e-402f-a00c-6f8fab613019 (ip-172-30-2-105.mesosphere.io) > started on 172.30.2.105:36006 > [20:11:25]W: [Step 10/10] I0805 20:11:25.004787 24314 master.cpp:377] Flags > at startup: --acls="" --agent_ping_timeout="15secs" > --agent_reregister_timeout="10mins" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --authenticate_http_readwrite="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/0z753P/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic"
[jira] [Created] (MESOS-5953) Default work dir is not root for unified containerizer and docker
Philip Winder created MESOS-5953: Summary: Default work dir is not root for unified containerizer and docker Key: MESOS-5953 URL: https://issues.apache.org/jira/browse/MESOS-5953 Project: Mesos Issue Type: Bug Components: containerization Reporter: Philip Winder According to the docker spec, the default working directory (WORKDIR) is root (/). https://docs.docker.com/engine/reference/run/#/workdir The unified containerizer with the docker runtime isolator sets the default working directory to /tmp/mesos/sandbox. Hence, dockerfiles that are relying on the default workdir will not work because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Winder updated MESOS-5953: - Description: According to the docker spec, the default working directory (WORKDIR) is root /. https://docs.docker.com/engine/reference/run/#/workdir The unified containerizer with the docker runtime isolator sets the default working directory to /tmp/mesos/sandbox. Hence, dockerfiles that are relying on the default workdir will not work because the pwd is changed by mesos. was: According to the docker spec, the default working directory (WORKDIR) is root (/). https://docs.docker.com/engine/reference/run/#/workdir The unified containerizer with the docker runtime isolator sets the default working directory to /tmp/mesos/sandbox. Hence, dockerfiles that are relying on the default workdir will not work because the pwd is changed by mesos. > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5927) Unable to run "scratch" Dockerfiles with Unified Containerizer
[ https://issues.apache.org/jira/browse/MESOS-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Winder updated MESOS-5927: - Description: It is not possible to run Docker containers that are based upon the "scratch" container. Setup: Mesos 1.0.0 with the following Mesos settings: {code:none} echo 'docker' | sudo tee /etc/mesos-slave/image_providers echo 'filesystem/linux,docker/runtime' | sudo tee /etc/mesos-slave/isolation {code} Recreate: From a Master or Slave, run: {code:none} mesos-execute --command='echo ok' --docker_image=hello-seattle --master=localhost:5050 --name=test {code} Effect: The container will crash with messages from Mesos reporting it can't mount folder x/y/z. E.g. can't mount /tmp. This means you can't run any container that is not a "fat" container (i.e. one with a full OS). E.g. error: bq. Failed to enter chroot '/var/lib/mesos/provisioner/containers/fed6add8-0126-40e6-ae81-5859a0c1a2d4/backends/copy/rootfses/4feefc8b-fd5a-4835-95db-165e675f11cd': /tmp in chroot does not existI0729 07:49:56.753474 4362 exec.cpp:413] Executor asked to shutdown Expected: Run without issues. Use case: We use scratch based containers with static binaries to keep the image size down. This is a common practice. was: It is not possible to run Docker containers that are based upon the "scratch" container. Setup: Mesos 1.0.0 with the following Mesos settings: {code:bash} echo 'docker' | sudo tee /etc/mesos-slave/image_providers echo 'filesystem/linux,docker/runtime' | sudo tee /etc/mesos-slave/isolation {code} Recreate: From a Master or Slave, run: {code:bash} mesos-execute --command='echo ok' --docker_image=hello-seattle --master=localhost:5050 --name=test {code} Effect: The container will crash with messages from Mesos reporting it can't mount folder x/y/z. E.g. can't mount /tmp. This means you can't run any container that is not a "fat" container (i.e. one with a full OS). E.g. error: bq. Failed to enter chroot '/var/lib/mesos/provisioner/containers/fed6add8-0126-40e6-ae81-5859a0c1a2d4/backends/copy/rootfses/4feefc8b-fd5a-4835-95db-165e675f11cd': /tmp in chroot does not existI0729 07:49:56.753474 4362 exec.cpp:413] Executor asked to shutdown Expected: Run without issues. Use case: We use scratch based containers with static binaries to keep the image size down. This is a common practice. > Unable to run "scratch" Dockerfiles with Unified Containerizer > -- > > Key: MESOS-5927 > URL: https://issues.apache.org/jira/browse/MESOS-5927 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > It is not possible to run Docker containers that are based upon the "scratch" > container. > Setup: Mesos 1.0.0 with the following Mesos settings: > {code:none} > echo 'docker' | sudo tee /etc/mesos-slave/image_providers > echo 'filesystem/linux,docker/runtime' | sudo tee /etc/mesos-slave/isolation > {code} > Recreate: From a Master or Slave, run: > {code:none} > mesos-execute --command='echo ok' --docker_image=hello-seattle > --master=localhost:5050 --name=test > {code} > Effect: The container will crash with messages from Mesos reporting it can't > mount folder x/y/z. E.g. can't mount /tmp. This means you can't run any > container that is not a "fat" container (i.e. one with a full OS). E.g. > error: > bq. Failed to enter chroot > '/var/lib/mesos/provisioner/containers/fed6add8-0126-40e6-ae81-5859a0c1a2d4/backends/copy/rootfses/4feefc8b-fd5a-4835-95db-165e675f11cd': > /tmp in chroot does not existI0729 07:49:56.753474 4362 exec.cpp:413] > Executor asked to shutdown > Expected: Run without issues. > Use case: We use scratch based containers with static binaries to keep the > image size down. This is a common practice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5927) Unable to run "scratch" Dockerfiles with Unified Containerizer
[ https://issues.apache.org/jira/browse/MESOS-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Winder updated MESOS-5927: - Description: It is not possible to run Docker containers that are based upon the "scratch" container. Setup: Mesos 1.0.0 with the following Mesos settings: {code:bash} echo 'docker' | sudo tee /etc/mesos-slave/image_providers echo 'filesystem/linux,docker/runtime' | sudo tee /etc/mesos-slave/isolation {code} Recreate: From a Master or Slave, run: {code:bash} mesos-execute --command='echo ok' --docker_image=hello-seattle --master=localhost:5050 --name=test {code} Effect: The container will crash with messages from Mesos reporting it can't mount folder x/y/z. E.g. can't mount /tmp. This means you can't run any container that is not a "fat" container (i.e. one with a full OS). E.g. error: bq. Failed to enter chroot '/var/lib/mesos/provisioner/containers/fed6add8-0126-40e6-ae81-5859a0c1a2d4/backends/copy/rootfses/4feefc8b-fd5a-4835-95db-165e675f11cd': /tmp in chroot does not existI0729 07:49:56.753474 4362 exec.cpp:413] Executor asked to shutdown Expected: Run without issues. Use case: We use scratch based containers with static binaries to keep the image size down. This is a common practice. was: It is not possible to run Docker containers that are based upon the "scratch" container. Setup: Mesos 1.0.0 with the following Mesos settings: ``` echo 'docker' | sudo tee /etc/mesos-slave/image_providers echo 'filesystem/linux,docker/runtime' | sudo tee /etc/mesos-slave/isolation ``` Recreate: From a Master or Slave, run something to the effect of: `mesos-execute --command='echo ok' --docker_image=scratch --master=$MASTER:5050` Effect: The container will crash with messages from Mesos reporting it can't mount folder x/y/z. E.g. can't mount /tmp. This means you can't run any container that is not a "fat" container (i.e. one with a full OS). Expected: Run without issues. Use case: We use scratch based containers with static binaries to keep the image size down. This is a common practice. > Unable to run "scratch" Dockerfiles with Unified Containerizer > -- > > Key: MESOS-5927 > URL: https://issues.apache.org/jira/browse/MESOS-5927 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > It is not possible to run Docker containers that are based upon the "scratch" > container. > Setup: Mesos 1.0.0 with the following Mesos settings: > {code:bash} > echo 'docker' | sudo tee /etc/mesos-slave/image_providers > echo 'filesystem/linux,docker/runtime' | sudo tee /etc/mesos-slave/isolation > {code} > Recreate: From a Master or Slave, run: > {code:bash} > mesos-execute --command='echo ok' --docker_image=hello-seattle > --master=localhost:5050 --name=test > {code} > Effect: The container will crash with messages from Mesos reporting it can't > mount folder x/y/z. E.g. can't mount /tmp. This means you can't run any > container that is not a "fat" container (i.e. one with a full OS). E.g. > error: > bq. Failed to enter chroot > '/var/lib/mesos/provisioner/containers/fed6add8-0126-40e6-ae81-5859a0c1a2d4/backends/copy/rootfses/4feefc8b-fd5a-4835-95db-165e675f11cd': > /tmp in chroot does not existI0729 07:49:56.753474 4362 exec.cpp:413] > Executor asked to shutdown > Expected: Run without issues. > Use case: We use scratch based containers with static binaries to keep the > image size down. This is a common practice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5927) Unable to run "scratch" Dockerfiles with Unified Containerizer
Philip Winder created MESOS-5927: Summary: Unable to run "scratch" Dockerfiles with Unified Containerizer Key: MESOS-5927 URL: https://issues.apache.org/jira/browse/MESOS-5927 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 1.0.0 Reporter: Philip Winder It is not possible to run Docker containers that are based upon the "scratch" container. Setup: Mesos 1.0.0 with the following Mesos settings: ``` echo 'docker' | sudo tee /etc/mesos-slave/image_providers echo 'filesystem/linux,docker/runtime' | sudo tee /etc/mesos-slave/isolation ``` Recreate: From a Master or Slave, run something to the effect of: `mesos-execute --command='echo ok' --docker_image=scratch --master=$MASTER:5050` Effect: The container will crash with messages from Mesos reporting it can't mount folder x/y/z. E.g. can't mount /tmp. This means you can't run any container that is not a "fat" container (i.e. one with a full OS). Expected: Run without issues. Use case: We use scratch based containers with static binaries to keep the image size down. This is a common practice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3821) DOCKER_HOST does not work well with --executor_environment_variables
[ https://issues.apache.org/jira/browse/MESOS-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376495#comment-15376495 ] Philip Winder commented on MESOS-3821: -- I've just hit this issue. Why was that PR never merged? We should be allowed to use a tcp socket for the docker daemon. > DOCKER_HOST does not work well with --executor_environment_variables > > > Key: MESOS-3821 > URL: https://issues.apache.org/jira/browse/MESOS-3821 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.25.0 > Environment: Docker 1.7.1 > Mesos 0.25.0 >Reporter: Lei Xu >Assignee: haosdent > > Hi guys, > I found that DOCKER_HOST does not work now if I set > bq. --executor_environment_variables={"DOCKER_HOST":"localhost:2377"} > but the docker executor always append > bq. -H unix:///var/run/docker.sock > on each command, it will overwrite the DOCKER_HOST in fact. > I think it is too strict now, and I could not disable it via some command > flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5702) CNI documentation example is not explicit enough about external plugins
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356937#comment-15356937 ] Philip Winder commented on MESOS-5702: -- Thanks. I didn't receive an email response. Submitted. > CNI documentation example is not explicit enough about external plugins > --- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Documentation >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI > example stated in the docs and restarted mesos-slave, I received a strange > error about not being able to find hadoop. > I think that it's related to this issue: > https://issues.apache.org/jira/browse/MESOS-5669 > I thought I'd log the issue, but if it has been fixed by the issue above, > feel free to close. > The setup, state and logs can be found here: > https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5702) CNI documentation example is not explicit enough about external plugins
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356765#comment-15356765 ] Philip Winder commented on MESOS-5702: -- Jie, Following the (incredibly complicated - why should I have to ask permission to submit a PR!!) instructions at http://mesos.apache.org/documentation/latest/submitting-a-patch/, I've sent a mail to d...@mesos.apache.org to try to assign myself to this task. But I haven't had a reply. Is this necessary, or can I just go ahead and submit straight to the reviewboard? Thanks. > CNI documentation example is not explicit enough about external plugins > --- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Documentation >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI > example stated in the docs and restarted mesos-slave, I received a strange > error about not being able to find hadoop. > I think that it's related to this issue: > https://issues.apache.org/jira/browse/MESOS-5669 > I thought I'd log the issue, but if it has been fixed by the issue above, > feel free to close. > The setup, state and logs can be found here: > https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5702) CNI documentation example is not explicit enough about external plugins
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350797#comment-15350797 ] Philip Winder edited comment on MESOS-5702 at 6/27/16 11:11 AM: Confirmed. The issue was that the cni bridge plugin wasn't installed. The documentation isn't explicit enough. I'll try and make a PR. For future reference, I got everything working with the following: {code} # Make dirs if they don't exist sudo mkdir -p /opt/cni/bin sudo mkdir -p /etc/cni/net.d # Add location of binary and conf directories for CNI. echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir # Add example Mesos CNI plugin configuration echo '{ "name": "cni-test", "type": "bridge", "bridge": "mesos-cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "192.168.0.0/16", "routes": [ { "dst": "0.0.0.0/0" } ] } }' | sudo tee /etc/cni/net.d/bridge.conf # Install go: sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz sudo tar -xvf go1.6.linux-amd64.tar.gz sudo mv go /usr/local export PATH=$PATH:/usr/local/go/bin export GOPATH=$HOME # Install CNI plugins git clone https://github.com/containernetworking/cni.git cd cni git checkout v0.3.0 ./build sudo cp bin/* /opt/cni/bin {code} Then to create a service to ping, try this: {code} # Start a container to ping. It will only be pingable from the same host. sudo mesos-execute --command='ifconfig ; sleep 999' --docker_image=amouat/network-utils --master=$MASTER:5050 --name=pingme --networks=cni-test # Then log on to the machine that the task was started. E.g. if it started on S0, log onto SLAVE0. Then you can: ping 192.168.0.2 # Or whatever IP it started on. # When in bridge mode, the container connects to an internal network local to that host. Hence, the pinger must run on the same machine as the pingme. So restart as many times as necessary to get it running on the same host. # Get the ip address from the first container. sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 9' --docker_image=amouat/network-utils --master=$MASTER:5050 --name=pinger --networks=cni-test {code} was (Author: philwinder): Confirmed. The issue was that the cni bridge plugin wasn't installed. The documentation isn't explicit enough. I'll try and make a PR. For future reference, I got everything working with the following: {code} # Make dirs if they don't exist sudo mkdir -p /opt/cni/bin sudo mkdir -p /etc/cni/net.d # Add location of binary and conf directories for CNI. echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir # Add example Mesos CNI plugin configuration echo '{ "name": "cni-test", "type": "bridge", "bridge": "mesos-cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "192.168.0.0/16", "routes": [ { "dst": "0.0.0.0/0" } ] } }' | sudo tee /etc/cni/net.d/bridge.conf # Install go: sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz sudo tar -xvf go1.6.linux-amd64.tar.gz sudo mv go /usr/local export PATH=$PATH:/usr/local/go/bin export GOPATH=$HOME # Install CNI plugins git clone https://github.com/containernetworking/cni.git cd cni git checkout v0.3.0 ./build sudo cp bin/* /opt/cni/bin {code} Then to create a service to ping, try this: {code} # Start a container to ping. It will only be pingable from the same host. sudo mesos-execute --command='ifconfig ; sleep 999' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme --networks=cni-test # Then log on to the machine that the task was started. E.g. if it started on S0, log onto SLAVE0. Then you can: ping 192.168.0.2 # Or whatever IP it started on. # When in bridge mode, the container connects to an internal network local to that host. Hence, the pinger must run on the same machine as the pingme. So restart as many times as necessary to get it running on the same host. # Get the ip address from the first container. sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 9' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger --networks=cni-test {code} > CNI documentation example is not explicit enough about external plugins > --- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI.
[jira] [Comment Edited] (MESOS-5702) CNI documentation example is not explicit enough about external plugins
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350797#comment-15350797 ] Philip Winder edited comment on MESOS-5702 at 6/27/16 11:10 AM: Confirmed. The issue was that the cni bridge plugin wasn't installed. The documentation isn't explicit enough. I'll try and make a PR. For future reference, I got everything working with the following: {code} # Make dirs if they don't exist sudo mkdir -p /opt/cni/bin sudo mkdir -p /etc/cni/net.d # Add location of binary and conf directories for CNI. echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir # Add example Mesos CNI plugin configuration echo '{ "name": "cni-test", "type": "bridge", "bridge": "mesos-cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "192.168.0.0/16", "routes": [ { "dst": "0.0.0.0/0" } ] } }' | sudo tee /etc/cni/net.d/bridge.conf # Install go: sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz sudo tar -xvf go1.6.linux-amd64.tar.gz sudo mv go /usr/local export PATH=$PATH:/usr/local/go/bin export GOPATH=$HOME # Install CNI plugins git clone https://github.com/containernetworking/cni.git cd cni git checkout v0.3.0 ./build sudo cp bin/* /opt/cni/bin {code} Then to create a service to ping, try this: {code} # Start a container to ping. It will only be pingable from the same host. sudo mesos-execute --command='ifconfig ; sleep 999' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme --networks=cni-test # Then log on to the machine that the task was started. E.g. if it started on S0, log onto SLAVE0. Then you can: ping 192.168.0.2 # Or whatever IP it started on. # When in bridge mode, the container connects to an internal network local to that host. Hence, the pinger must run on the same machine as the pingme. So restart as many times as necessary to get it running on the same host. # Get the ip address from the first container. sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 9' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger --networks=cni-test {code} was (Author: philwinder): Confirmed. The issue was that the cni bridge plugin wasn't installed. The documentation isn't explicit enough. I'll try and make a PR. For future reference, I got everything working with the following: {code:bash} # Make dirs if they don't exist sudo mkdir -p /opt/cni/bin sudo mkdir -p /etc/cni/net.d # Add location of binary and conf directories for CNI. echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir # Add example Mesos CNI plugin configuration echo '{ "name": "cni-test", "type": "bridge", "bridge": "mesos-cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "192.168.0.0/16", "routes": [ { "dst": "0.0.0.0/0" } ] } }' | sudo tee /etc/cni/net.d/bridge.conf # Install go: sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz sudo tar -xvf go1.6.linux-amd64.tar.gz sudo mv go /usr/local export PATH=$PATH:/usr/local/go/bin export GOPATH=$HOME # Install CNI plugins git clone https://github.com/containernetworking/cni.git cd cni git checkout v0.3.0 ./build sudo cp bin/* /opt/cni/bin {code} Then to create a service to ping, try this: {code:bash} # Start a container to ping. It will only be pingable from the same host. sudo mesos-execute --command='ifconfig ; sleep 999' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme --networks=cni-test # Then log on to the machine that the task was started. E.g. if it started on S0, log onto SLAVE0. Then you can: ping 192.168.0.2 # Or whatever IP it started on. # When in bridge mode, the container connects to an internal network local to that host. Hence, the pinger must run on the same machine as the pingme. So restart as many times as necessary to get it running on the same host. # Get the ip address from the first container. sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 9' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger --networks=cni-test {code} > CNI documentation example is not explicit enough about external plugins > --- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Bug >Affects Versions:
[jira] [Commented] (MESOS-5702) CNI documentation example is not explicit enough about external plugins
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350797#comment-15350797 ] Philip Winder commented on MESOS-5702: -- Confirmed. The issue was that the cni bridge plugin wasn't installed. The documentation isn't explicit enough. I'll try and make a PR. For future reference, I got everything working with the following: {code:bash} # Make dirs if they don't exist sudo mkdir -p /opt/cni/bin sudo mkdir -p /etc/cni/net.d # Add location of binary and conf directories for CNI. echo '/opt/cni/bin' | sudo tee /etc/mesos-slave/network_cni_plugins_dir echo '/etc/cni/net.d' | sudo tee /etc/mesos-slave/network_cni_config_dir # Add example Mesos CNI plugin configuration echo '{ "name": "cni-test", "type": "bridge", "bridge": "mesos-cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "192.168.0.0/16", "routes": [ { "dst": "0.0.0.0/0" } ] } }' | sudo tee /etc/cni/net.d/bridge.conf # Install go: sudo curl -O https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz sudo tar -xvf go1.6.linux-amd64.tar.gz sudo mv go /usr/local export PATH=$PATH:/usr/local/go/bin export GOPATH=$HOME # Install CNI plugins git clone https://github.com/containernetworking/cni.git cd cni git checkout v0.3.0 ./build sudo cp bin/* /opt/cni/bin {code} Then to create a service to ping, try this: {code:bash} # Start a container to ping. It will only be pingable from the same host. sudo mesos-execute --command='ifconfig ; sleep 999' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pingme --networks=cni-test # Then log on to the machine that the task was started. E.g. if it started on S0, log onto SLAVE0. Then you can: ping 192.168.0.2 # Or whatever IP it started on. # When in bridge mode, the container connects to an internal network local to that host. Hence, the pinger must run on the same machine as the pingme. So restart as many times as necessary to get it running on the same host. # Get the ip address from the first container. sudo mesos-execute --command='ifconfig && ping -v -c 1 192.168.0.2 && sleep 9' --docker_image=amouat/network-utils --master=ec2-52-16-230-26.eu-west-1.compute.amazonaws.com:5050 --name=pinger --networks=cni-test {code} > CNI documentation example is not explicit enough about external plugins > --- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI > example stated in the docs and restarted mesos-slave, I received a strange > error about not being able to find hadoop. > I think that it's related to this issue: > https://issues.apache.org/jira/browse/MESOS-5669 > I thought I'd log the issue, but if it has been fixed by the issue above, > feel free to close. > The setup, state and logs can be found here: > https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5702) CNI documentation example is not explicit enough about external plugins
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Winder updated MESOS-5702: - Summary: CNI documentation example is not explicit enough about external plugins (was: CNI example doesn't work: hadoop not found) > CNI documentation example is not explicit enough about external plugins > --- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI > example stated in the docs and restarted mesos-slave, I received a strange > error about not being able to find hadoop. > I think that it's related to this issue: > https://issues.apache.org/jira/browse/MESOS-5669 > I thought I'd log the issue, but if it has been fixed by the issue above, > feel free to close. > The setup, state and logs can be found here: > https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5702) CNI example doesn't work: hadoop not found
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350649#comment-15350649 ] Philip Winder edited comment on MESOS-5702 at 6/27/16 8:48 AM: --- Thanks Jie. I turned on the logging and found this: ``` Failed to create a containerizer: Could not create MesosContainerizer: Could not create isolator 'network/cni': Failed to find CNI plugin '/opt/cni/bin/bridge' used by CNI network configuration file '/etc/cni/net.d/bridge.conf' ``` So, it seems that we actually need to install a bridge plugin. The documentation wasn't clear on that. I assumed that bridge was some internal plugin provided by the Kernel or Mesos. I'll try adding that. Again, I'm assuming they mean the cni bridge example. was (Author: philwinder): Thanks Jie. I turned on the logging and found this: ``` Failed to create a containerizer: Could not create MesosContainerizer: Could not create isolator 'network/cni': Failed to find CNI plugin '/opt/cni/bin/bridge' used by CNI network configuration file '/etc/cni/net.d/bridge.conf' ``` So, it seems that we actually need to install a bridge plugin. The documentation wasn't clear on that. I assumed that bridge was some internal plugin provided by the Kernel or Mesos. I'll try adding that. Again, I'm assuming they mean the cni bridge example. > CNI example doesn't work: hadoop not found > -- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI > example stated in the docs and restarted mesos-slave, I received a strange > error about not being able to find hadoop. > I think that it's related to this issue: > https://issues.apache.org/jira/browse/MESOS-5669 > I thought I'd log the issue, but if it has been fixed by the issue above, > feel free to close. > The setup, state and logs can be found here: > https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5702) CNI example doesn't work: hadoop not found
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350649#comment-15350649 ] Philip Winder commented on MESOS-5702: -- Thanks Jie. I turned on the logging and found this: ``` Failed to create a containerizer: Could not create MesosContainerizer: Could not create isolator 'network/cni': Failed to find CNI plugin '/opt/cni/bin/bridge' used by CNI network configuration file '/etc/cni/net.d/bridge.conf' ``` So, it seems that we actually need to install a bridge plugin. The documentation wasn't clear on that. I assumed that bridge was some internal plugin provided by the Kernel or Mesos. I'll try adding that. Again, I'm assuming they mean the cni bridge example. > CNI example doesn't work: hadoop not found > -- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI > example stated in the docs and restarted mesos-slave, I received a strange > error about not being able to find hadoop. > I think that it's related to this issue: > https://issues.apache.org/jira/browse/MESOS-5669 > I thought I'd log the issue, but if it has been fixed by the issue above, > feel free to close. > The setup, state and logs can be found here: > https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5702) CNI example doesn't work: hadoop not found
[ https://issues.apache.org/jira/browse/MESOS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348114#comment-15348114 ] Philip Winder commented on MESOS-5702: -- But it did affect it. It caused an error and then crashed out. See the logs. > CNI example doesn't work: hadoop not found > -- > > Key: MESOS-5702 > URL: https://issues.apache.org/jira/browse/MESOS-5702 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Philip Winder > > I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI > example stated in the docs and restarted mesos-slave, I received a strange > error about not being able to find hadoop. > I think that it's related to this issue: > https://issues.apache.org/jira/browse/MESOS-5669 > I thought I'd log the issue, but if it has been fixed by the issue above, > feel free to close. > The setup, state and logs can be found here: > https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5702) CNI example doesn't work: hadoop not found
Philip Winder created MESOS-5702: Summary: CNI example doesn't work: hadoop not found Key: MESOS-5702 URL: https://issues.apache.org/jira/browse/MESOS-5702 Project: Mesos Issue Type: Bug Affects Versions: 1.0.0 Reporter: Philip Winder I'm testing Mesos 1.0.0-rc1 with Weave CNI. When I switched back to the CNI example stated in the docs and restarted mesos-slave, I received a strange error about not being able to find hadoop. I think that it's related to this issue: https://issues.apache.org/jira/browse/MESOS-5669 I thought I'd log the issue, but if it has been fixed by the issue above, feel free to close. The setup, state and logs can be found here: https://gist.github.com/philwinder/8f4c652723fa5c374b86a5e440bf4330 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4449) SegFault on agent during executor startup
[ https://issues.apache.org/jira/browse/MESOS-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112194#comment-15112194 ] Philip Winder commented on MESOS-4449: -- Hi guys. Is there any chance of a point fix? Because this makes 0.26 unusable. Thanks, Phil > SegFault on agent during executor startup > - > > Key: MESOS-4449 > URL: https://issues.apache.org/jira/browse/MESOS-4449 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 > Environment: Some setup details: > - Master and agents running in separate docker containers on the same host. > - Containers based upon Ubuntu 14.04 using Mesosphere produced Mesos deb > files. For more details see > (https://github.com/ContainerSolutions/minimesos-docker) > - This only occurs with 0.26, not with 0.25. >Reporter: Philip Winder >Assignee: Anand Mazumdar >Priority: Blocker > Labels: mesosphere > Attachments: agent.txt, master.txt > > > When repeatedly performing our system tests we have found that we get a > segfault on one of the agents. It probably occurs about one time in ten. I > have attached the full log from that agent. I've attached the log from the > agent that failed and the master (although I think this is less helpful). > To reproduce > - I have no idea. It seems to occur at certain times. E.g. like if a packet > is created right on a minute boundary or something. But I don't think it's > something caused by our code because the timestamps are stamped by mesos. I > was surprised not to find a bug already open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4449) SegFault on agent during executor startup
[ https://issues.apache.org/jira/browse/MESOS-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Winder updated MESOS-4449: - Attachment: agent.txt master.txt Log files during segfault. > SegFault on agent during executor startup > - > > Key: MESOS-4449 > URL: https://issues.apache.org/jira/browse/MESOS-4449 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 > Environment: Some setup details: > - Master and agents running in separate docker containers on the same host. > - Containers based upon Ubuntu 14.04 using Mesosphere produced Mesos deb > files. For more details see > (https://github.com/ContainerSolutions/minimesos-docker) > - This only occurs with 0.26, not with 0.25. >Reporter: Philip Winder > Attachments: agent.txt, master.txt > > > When repeatedly performing our system tests we have found that we get a > segfault on one of the agents. It probably occurs about one time in ten. I > have attached the full log from that agent. I've attached the log from the > agent that failed and the master (although I think this is less helpful). > To reproduce > - I have no idea. It seems to occur at certain times. E.g. like if a packet > is created right on a minute boundary or something. But I don't think it's > something caused by our code because the timestamps are stamped by mesos. I > was surprised not to find a bug already open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4449) SegFault on agent during executor startup
Philip Winder created MESOS-4449: Summary: SegFault on agent during executor startup Key: MESOS-4449 URL: https://issues.apache.org/jira/browse/MESOS-4449 Project: Mesos Issue Type: Bug Affects Versions: 0.26.0 Environment: Some setup details: - Master and agents running in separate docker containers on the same host. - Containers based upon Ubuntu 14.04 using Mesosphere produced Mesos deb files. For more details see (https://github.com/ContainerSolutions/minimesos-docker) - This only occurs with 0.26, not with 0.25. Reporter: Philip Winder When repeatedly performing our system tests we have found that we get a segfault on one of the agents. It probably occurs about one time in ten. I have attached the full log from that agent. I've attached the log from the agent that failed and the master (although I think this is less helpful). To reproduce - I have no idea. It seems to occur at certain times. E.g. like if a packet is created right on a minute boundary or something. But I don't think it's something caused by our code because the timestamps are stamped by mesos. I was surprised not to find a bug already open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)