See the "Upgrading from 0.20.x to 0.21.x" section on
http://mesos.apache.org/documentation/upgrades/

On Tue, Feb 24, 2015 at 4:02 PM, max square <[email protected]> wrote:

> Adam/Tim,
>
> that's exactly the issue thanks! I am using 0.20.0 currently.
>
> Is there any particular suggested update approach?
>
> FYI I have another in-house cluster running 0.21.1 and it worked fine.
> Haven't been able to get that to work on AWS though.
>
> Thanks!
>
> Sergio Daniel
>
> On Tue, Feb 24, 2015 at 12:56 PM, Tim Chen <[email protected]> wrote:
>
>> Hi Sergio,
>>
>> As Adam mentioned that issue should be fixed in Mesos 0.21, as Chronos
>> usually put a colon in the executor id.
>>
>> Let me know if upgrading to >= 0.21 doesn't fix this.
>>
>> Thanks!
>>
>> Tim
>>
>> On Tue, Feb 24, 2015 at 9:45 AM, Adam Bordelon <[email protected]>
>> wrote:
>>
>>> Ah, colons in the executorId. What version of Mesos are you running? You
>>> might be hitting https://issues.apache.org/jira/browse/MESOS-1833
>>>
>>> On Tue, Feb 24, 2015 at 9:39 AM, max square <[email protected]>
>>> wrote:
>>>
>>>> Adam,
>>>>
>>>> Thanks for the pointer, I was able to pull the logs for the docker run
>>>> command. Up to my understanding it is actually pulling the image, but it is
>>>> having trouble starting the actual docker, I highlighted in red what I
>>>> think is the main reason for the error: a bad format for the volume where
>>>> mesos wants to mount the volume for the docker. (See logs below)
>>>> I also notice that after the error it can't find the container, is that
>>>> normal?
>>>>
>>>> Do you have any thoughts? Any help is greatly appreciated!
>>>>
>>>> #It all starts well
>>>> I0223 21:08:00.884003  1618 docker.cpp:743] Starting container
>>>> '1f5295b2-9694-40ec-b900-f17de71d3bf4' for task
>>>> 'ct:1424725680000:0:dockerjob:' (and executor
>>>> 'ct:1424725680000:0:dockerjob:') of framework
>>>> '20150222-090257-1326718892-5050-9995-0001'
>>>>
>>>> #Can't use the format for the volume
>>>> E0223 21:09:40.772276  1625 slave.cpp:2485] Container
>>>> '1f5295b2-9694-40ec-b900-f17de71d3bf4' for executor
>>>> 'ct:1424725680000:0:dockerjob:' of framework
>>>> '20150222-090257-1326718892-5050-9995-0001' failed to start: Failed to
>>>> 'docker run -d -c 512 -m 536870912 -e
>>>> mesos_task_id=ct:1424725680000:0:dockerjob: -e CHRONOS_JOB_OWNER= -e
>>>> CHRONOS_JOB_NAME=dockerjob -e HOST=
>>>> ec2-52-1-xx-xx.compute-1.amazonaws.com -e CHRONOS_RESOURCE_MEM=512.0
>>>> -e CHRONOS_RESOURCE_CPU=0.5 -e CHRONOS_RESOURCE_DISK=256.0 -e
>>>> MESOS_SANDBOX=/mnt/mesos/sandbox -v
>>>> /tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox
>>>> --net host --entrypoint /bin/sh --name
>>>> mesos-1f5295b2-9694-40ec-b900-f17de71d3bf4 libmesos/ubuntu -c while sleep
>>>> 10; do date -u +%T; done': exit status = exited with status 2 stderr =
>>>> invalid value
>>>> "/tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox"
>>>> for flag -v: bad format for volumes:
>>>> /tmp/mesos/slaves/20150220-234013-1326718892-5050-1615-2/frameworks/20150222-090257-1326718892-5050-9995-0001/executors/ct:1424725680000:0:dockerjob:/runs/1f5295b2-9694-40ec-b900-f17de71d3bf4:/mnt/mesos/sandbox
>>>>
>>>> Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
>>>>                                'bridge': creates a new network stack
>>>> for the container on the docker bridge
>>>>                                (use 'docker port' to see the actual
>>>> mapping)
>>>>
>>>> # Fails and can't destroy the container, is this normal?
>>>> E0223 21:09:40.772655  1625 slave.cpp:2580] Termination of executor
>>>> 'ct:1424725680000:0:dockerjob:' of framework
>>>> '20150222-090257-1326718892-5050-9995-0001' failed: No container found
>>>>
>>>> Thanks in advance
>>>>
>>>> Sergio Daniel
>>>>
>>>> On Tue, Feb 24, 2015 at 1:51 AM, Adam Bordelon <[email protected]>
>>>> wrote:
>>>>
>>>>> Check the mesos-slave log on one of the slaves, in
>>>>> /var/log/mesos/mesos-slave.INFO. There's probably some information there
>>>>> about the docker pull, or other things that could have errored before the
>>>>> actual container is launched.
>>>>> Alternatively, you could try a `docker pull` manually on one of the
>>>>> slaves, then see if the launch succeeds on that node. Then you'll know if
>>>>> it was a timeout during the docker pull, at which point you can either
>>>>> further increase the registration timeout or decide to pre-pull all your
>>>>> images (as a periodic Chronos task?), due to unpredictable network
>>>>> latencies in AWS.
>>>>>
>>>>> On Mon, Feb 23, 2015 at 4:31 PM, max square <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am using the cloudformation scripts
>>>>>> <https://github.com/mbabineau/cloudformation-mesos>to create a Mesos
>>>>>> cluster, with Marathon 0.7.5, and Chronos 2.3.2. The setup is working
>>>>>> perfectly,  for regula processes. However now I am trying to deploy a
>>>>>> simple docker image, but it is failing without producing any errors in 
>>>>>> the
>>>>>> sandbox.
>>>>>>
>>>>>> I followed the following tutorial
>>>>>> <https://mesosphere.com/docs/tutorials/launch-docker-container-on-mesosphere/>
>>>>>>  to
>>>>>> set the Mesos Executor Timeout to 5mins and I can see the following
>>>>>> processes running on all the slave machines. Where the containerizers are
>>>>>> in the correct order:
>>>>>>
>>>>>> root      1615  0.0  0.0    168     4 ?        Ss   Feb20   0:00
>>>>>> runsv mesos-slave
>>>>>>
>>>>>> root      1616  0.0  0.0    184     4 ?        S    Feb20   0:00
>>>>>> svlogd -tt /var/log/mesos-slave
>>>>>>
>>>>>> root      1617  3.1  0.2 874688 17376 ?        Sl   Feb20 133:56
>>>>>> /usr/local/sbin/mesos-slave --log_dir=/var/log/mesos
>>>>>> --containerizers=docker,mesos
>>>>>>
>>>>>> root      3290  0.0  0.0   4444   652 ?        Ss   Feb21   0:00 sh
>>>>>> -c /usr/local/libexec/mesos/mesos-executor
>>>>>>
>>>>>> root      3304  0.0  0.1 720528 10332 ?        Sl   Feb21   1:01
>>>>>> /usr/local/libexec/mesos mesos-executor
>>>>>>
>>>>>> Does anyone have suggestions on how to debug the issue?
>>>>>>
>>>>>> Thanks in Advance
>>>>>>
>>>>>> Sergio Daniel
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to