Adam/Tim, that's exactly the issue thanks! I am using 0.20.0 currently.
Is there any particular suggested update approach? FYI I have another in-house cluster running 0.21.1 and it worked fine. Haven't been able to get that to work on AWS though. Thanks! Sergio Daniel On Tue, Feb 24, 2015 at 12:56 PM, Tim Chen <[email protected]> wrote: > Hi Sergio, > > As Adam mentioned that issue should be fixed in Mesos 0.21, as Chronos > usually put a colon in the executor id. > > Let me know if upgrading to >= 0.21 doesn't fix this. > > Thanks! > > Tim > > On Tue, Feb 24, 2015 at 9:45 AM, Adam Bordelon <[email protected]> wrote: > >> Ah, colons in the executorId. What version of Mesos are you running? You >> might be hitting https://issues.apache.org/jira/browse/MESOS-1833 >> >> On Tue, Feb 24, 2015 at 9:39 AM, max square <[email protected]> >> wrote: >> >>> Adam, >>> >>> Thanks for the pointer, I was able to pull the logs for the docker run >>> command. Up to my understanding it is actually pulling the image, but it is >>> having trouble starting the actual docker, I highlighted in red what I >>> think is the main reason for the error: a bad format for the volume where >>> mesos wants to mount the volume for the docker. (See logs below) >>> I also notice that after the error it can't find the container, is that >>> normal? >>> >>> Do you have any thoughts? Any help is greatly appreciated! >>> >>> #It all starts well >>> I0223 21:08:00.884003 1618 docker.cpp:743] Starting container >>> '1f5295b2-9694-40ec-b900-f17de71d3bf4' for task >>> 'ct:1424725680000:0:dockerjob:' (and executor >>> 'ct:1424725680000:0:dockerjob:') of framework >>> '20150222-090257-1326718892-5050-9995-0001' >>> >>> #Can't use the format for the volume >>> E0223 21:09:40.772276 1625 slave.cpp:2485] Container >>> '1f5295b2-9694-40ec-b900-f17de71d3bf4' for executor >>> 'ct:1424725680000:0:dockerjob:' of framework >>> '20150222-090257-1326718892-5050-9995-0001' failed to start: Failed to >>> 'docker run -d -c 512 -m 536870912 -e >>> mesos_task_id=ct:1424725680000:0:dockerjob: -e CHRONOS_JOB_OWNER= -e >>> CHRONOS_JOB_NAME=dockerjob -e HOST= >>> ec2-52-1-xx-xx.compute-1.amazonaws.com -e CHRONOS_RESOURCE_MEM=512.0 -e >>> CHRONOS_RESOURCE_CPU=0.5 -e CHRONOS_RESOURCE_DISK=256.0 -e >>> MESOS_SANDBOX=/mnt/mesos/sandbox -v >>> /tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox >>> --net host --entrypoint /bin/sh --name >>> mesos-1f5295b2-9694-40ec-b900-f17de71d3bf4 libmesos/ubuntu -c while sleep >>> 10; do date -u +%T; done': exit status = exited with status 2 stderr = >>> invalid value >>> "/tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox" >>> for flag -v: bad format for volumes: >>> /tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox >>> >>> Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...] >>> 'bridge': creates a new network stack for >>> the container on the docker bridge >>> (use 'docker port' to see the actual >>> mapping) >>> >>> # Fails and can't destroy the container, is this normal? >>> E0223 21:09:40.772655 1625 slave.cpp:2580] Termination of executor >>> 'ct:1424725680000:0:dockerjob:' of framework >>> '20150222-090257-1326718892-5050-9995-0001' failed: No container found >>> >>> Thanks in advance >>> >>> Sergio Daniel >>> >>> On Tue, Feb 24, 2015 at 1:51 AM, Adam Bordelon <[email protected]> >>> wrote: >>> >>>> Check the mesos-slave log on one of the slaves, in >>>> /var/log/mesos/mesos-slave.INFO. There's probably some information there >>>> about the docker pull, or other things that could have errored before the >>>> actual container is launched. >>>> Alternatively, you could try a `docker pull` manually on one of the >>>> slaves, then see if the launch succeeds on that node. Then you'll know if >>>> it was a timeout during the docker pull, at which point you can either >>>> further increase the registration timeout or decide to pre-pull all your >>>> images (as a periodic Chronos task?), due to unpredictable network >>>> latencies in AWS. >>>> >>>> On Mon, Feb 23, 2015 at 4:31 PM, max square <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am using the cloudformation scripts >>>>> <https://github.com/mbabineau/cloudformation-mesos>to create a Mesos >>>>> cluster, with Marathon 0.7.5, and Chronos 2.3.2. The setup is working >>>>> perfectly, for regula processes. However now I am trying to deploy a >>>>> simple docker image, but it is failing without producing any errors in the >>>>> sandbox. >>>>> >>>>> I followed the following tutorial >>>>> <https://mesosphere.com/docs/tutorials/launch-docker-container-on-mesosphere/> >>>>> to >>>>> set the Mesos Executor Timeout to 5mins and I can see the following >>>>> processes running on all the slave machines. Where the containerizers are >>>>> in the correct order: >>>>> >>>>> root 1615 0.0 0.0 168 4 ? Ss Feb20 0:00 runsv >>>>> mesos-slave >>>>> >>>>> root 1616 0.0 0.0 184 4 ? S Feb20 0:00 >>>>> svlogd -tt /var/log/mesos-slave >>>>> >>>>> root 1617 3.1 0.2 874688 17376 ? Sl Feb20 133:56 >>>>> /usr/local/sbin/mesos-slave --log_dir=/var/log/mesos >>>>> --containerizers=docker,mesos >>>>> >>>>> root 3290 0.0 0.0 4444 652 ? Ss Feb21 0:00 sh -c >>>>> /usr/local/libexec/mesos/mesos-executor >>>>> >>>>> root 3304 0.0 0.1 720528 10332 ? Sl Feb21 1:01 >>>>> /usr/local/libexec/mesos mesos-executor >>>>> >>>>> Does anyone have suggestions on how to debug the issue? >>>>> >>>>> Thanks in Advance >>>>> >>>>> Sergio Daniel >>>>> >>>> >>>> >>> >> >

