See the "Upgrading from 0.20.x to 0.21.x" section on http://mesos.apache.org/documentation/upgrades/
On Tue, Feb 24, 2015 at 4:02 PM, max square <[email protected]> wrote: > Adam/Tim, > > that's exactly the issue thanks! I am using 0.20.0 currently. > > Is there any particular suggested update approach? > > FYI I have another in-house cluster running 0.21.1 and it worked fine. > Haven't been able to get that to work on AWS though. > > Thanks! > > Sergio Daniel > > On Tue, Feb 24, 2015 at 12:56 PM, Tim Chen <[email protected]> wrote: > >> Hi Sergio, >> >> As Adam mentioned that issue should be fixed in Mesos 0.21, as Chronos >> usually put a colon in the executor id. >> >> Let me know if upgrading to >= 0.21 doesn't fix this. >> >> Thanks! >> >> Tim >> >> On Tue, Feb 24, 2015 at 9:45 AM, Adam Bordelon <[email protected]> >> wrote: >> >>> Ah, colons in the executorId. What version of Mesos are you running? You >>> might be hitting https://issues.apache.org/jira/browse/MESOS-1833 >>> >>> On Tue, Feb 24, 2015 at 9:39 AM, max square <[email protected]> >>> wrote: >>> >>>> Adam, >>>> >>>> Thanks for the pointer, I was able to pull the logs for the docker run >>>> command. Up to my understanding it is actually pulling the image, but it is >>>> having trouble starting the actual docker, I highlighted in red what I >>>> think is the main reason for the error: a bad format for the volume where >>>> mesos wants to mount the volume for the docker. (See logs below) >>>> I also notice that after the error it can't find the container, is that >>>> normal? >>>> >>>> Do you have any thoughts? Any help is greatly appreciated! >>>> >>>> #It all starts well >>>> I0223 21:08:00.884003 1618 docker.cpp:743] Starting container >>>> '1f5295b2-9694-40ec-b900-f17de71d3bf4' for task >>>> 'ct:1424725680000:0:dockerjob:' (and executor >>>> 'ct:1424725680000:0:dockerjob:') of framework >>>> '20150222-090257-1326718892-5050-9995-0001' >>>> >>>> #Can't use the format for the volume >>>> E0223 21:09:40.772276 1625 slave.cpp:2485] Container >>>> '1f5295b2-9694-40ec-b900-f17de71d3bf4' for executor >>>> 'ct:1424725680000:0:dockerjob:' of framework >>>> '20150222-090257-1326718892-5050-9995-0001' failed to start: Failed to >>>> 'docker run -d -c 512 -m 536870912 -e >>>> mesos_task_id=ct:1424725680000:0:dockerjob: -e CHRONOS_JOB_OWNER= -e >>>> CHRONOS_JOB_NAME=dockerjob -e HOST= >>>> ec2-52-1-xx-xx.compute-1.amazonaws.com -e CHRONOS_RESOURCE_MEM=512.0 >>>> -e CHRONOS_RESOURCE_CPU=0.5 -e CHRONOS_RESOURCE_DISK=256.0 -e >>>> MESOS_SANDBOX=/mnt/mesos/sandbox -v >>>> /tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox >>>> --net host --entrypoint /bin/sh --name >>>> mesos-1f5295b2-9694-40ec-b900-f17de71d3bf4 libmesos/ubuntu -c while sleep >>>> 10; do date -u +%T; done': exit status = exited with status 2 stderr = >>>> invalid value >>>> "/tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox" >>>> for flag -v: bad format for volumes: >>>> /tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox >>>> >>>> Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...] >>>> 'bridge': creates a new network stack >>>> for the container on the docker bridge >>>> (use 'docker port' to see the actual >>>> mapping) >>>> >>>> # Fails and can't destroy the container, is this normal? >>>> E0223 21:09:40.772655 1625 slave.cpp:2580] Termination of executor >>>> 'ct:1424725680000:0:dockerjob:' of framework >>>> '20150222-090257-1326718892-5050-9995-0001' failed: No container found >>>> >>>> Thanks in advance >>>> >>>> Sergio Daniel >>>> >>>> On Tue, Feb 24, 2015 at 1:51 AM, Adam Bordelon <[email protected]> >>>> wrote: >>>> >>>>> Check the mesos-slave log on one of the slaves, in >>>>> /var/log/mesos/mesos-slave.INFO. There's probably some information there >>>>> about the docker pull, or other things that could have errored before the >>>>> actual container is launched. >>>>> Alternatively, you could try a `docker pull` manually on one of the >>>>> slaves, then see if the launch succeeds on that node. Then you'll know if >>>>> it was a timeout during the docker pull, at which point you can either >>>>> further increase the registration timeout or decide to pre-pull all your >>>>> images (as a periodic Chronos task?), due to unpredictable network >>>>> latencies in AWS. >>>>> >>>>> On Mon, Feb 23, 2015 at 4:31 PM, max square <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am using the cloudformation scripts >>>>>> <https://github.com/mbabineau/cloudformation-mesos>to create a Mesos >>>>>> cluster, with Marathon 0.7.5, and Chronos 2.3.2. The setup is working >>>>>> perfectly, for regula processes. However now I am trying to deploy a >>>>>> simple docker image, but it is failing without producing any errors in >>>>>> the >>>>>> sandbox. >>>>>> >>>>>> I followed the following tutorial >>>>>> <https://mesosphere.com/docs/tutorials/launch-docker-container-on-mesosphere/> >>>>>> to >>>>>> set the Mesos Executor Timeout to 5mins and I can see the following >>>>>> processes running on all the slave machines. Where the containerizers are >>>>>> in the correct order: >>>>>> >>>>>> root 1615 0.0 0.0 168 4 ? Ss Feb20 0:00 >>>>>> runsv mesos-slave >>>>>> >>>>>> root 1616 0.0 0.0 184 4 ? S Feb20 0:00 >>>>>> svlogd -tt /var/log/mesos-slave >>>>>> >>>>>> root 1617 3.1 0.2 874688 17376 ? Sl Feb20 133:56 >>>>>> /usr/local/sbin/mesos-slave --log_dir=/var/log/mesos >>>>>> --containerizers=docker,mesos >>>>>> >>>>>> root 3290 0.0 0.0 4444 652 ? Ss Feb21 0:00 sh >>>>>> -c /usr/local/libexec/mesos/mesos-executor >>>>>> >>>>>> root 3304 0.0 0.1 720528 10332 ? Sl Feb21 1:01 >>>>>> /usr/local/libexec/mesos mesos-executor >>>>>> >>>>>> Does anyone have suggestions on how to debug the issue? >>>>>> >>>>>> Thanks in Advance >>>>>> >>>>>> Sergio Daniel >>>>>> >>>>> >>>>> >>>> >>> >> >

