Tim, mind updating the documentation
<http://mesos.apache.org/documentation/latest/docker-containerizer/> to
make sure others don't fall into the same trap?

On Tue, Sep 30, 2014 at 11:38 AM, Tim Chen <[email protected]> wrote:

> Hi Andy,
>
> Good catch, I also missed that as I was just looking at the Docker
> configurations.
>
> You'll set the Executor when you have an custom executor.
>
> Let us know if you have any other problems.
>
> Tim
>
> On Tue, Sep 30, 2014 at 11:02 AM, Andy Grove <[email protected]>
> wrote:
>
>> OK. So I figured out the issue with this and it was my misunderstanding
>> of executors and tasks.
>>
>> My task info had:
>>
>> .setExecutor(Protos.ExecutorInfo.newBuilder(executor))
>>
>> I should have had this:
>>
>>             .setContainer(containerInfoBuilder)
>>             .setCommand(Protos.CommandInfo.newBuilder().setShell(false))
>>
>> I didn't have a mesos executor deployed inside my container which
>> explains the timeout issue.
>>
>> Thanks again for the support.
>>
>>
>> Thanks,
>>
>> Andy.
>>
>> --
>> Andy Grove
>> VP Engineering
>> CodeFutures Corporation
>>
>>
>>
>> On Tue, Sep 30, 2014 at 10:20 AM, Andy Grove <[email protected]>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> Thanks for helping with this. I am running mesos-master and mesos-slave
>>> natively on the same host (my desktop). The only container in use is the
>>> one being launched by the mesos-slave.
>>>
>>> I will try your suggestion of running a simple command next.
>>>
>>> Here is the output from the slave from this issue though:
>>>
>>> I0930 10:13:52.053177 30722 main.cpp:126] Build: 2014-09-29 15:35:37 by
>>> andy
>>> I0930 10:13:52.053228 30722 main.cpp:128] Version: 0.20.1
>>> I0930 10:13:53.055480 30722 containerizer.cpp:89] Using isolation:
>>> posix/cpu,posix/mem
>>> I0930 10:13:53.058353 30722 main.cpp:149] Starting Mesos slave
>>> I0930 10:13:53.059651 30722 slave.cpp:167] Slave started on 1)@
>>> 127.0.1.1:5051
>>> I0930 10:13:53.060072 30722 slave.cpp:278] Slave resources: cpus(*):8;
>>> mem(*):14963; disk(*):1.85648e+06; ports(*):[31000-32000]
>>> I0930 10:13:53.060226 30722 slave.cpp:306] Slave hostname: davros
>>> I0930 10:13:53.060253 30722 slave.cpp:307] Slave checkpoint: true
>>> I0930 10:13:53.064975 30729 state.cpp:33] Recovering state from
>>> '/tmp/mesos/meta'
>>> I0930 10:13:53.065352 30725 status_update_manager.cpp:193] Recovering
>>> status update manager
>>> I0930 10:13:53.065626 30729 docker.cpp:577] Recovering Docker containers
>>> I0930 10:13:53.065690 30724 containerizer.cpp:252] Recovering
>>> containerizer
>>> I0930 10:13:54.055233 30723 slave.cpp:3198] Finished recovery
>>> I0930 10:13:54.055448 30723 slave.cpp:589] New master detected at
>>> [email protected]:5050
>>> I0930 10:13:54.055532 30723 slave.cpp:625] No credentials provided.
>>> Attempting to register without authentication
>>> I0930 10:13:54.055537 30730 status_update_manager.cpp:167] New master
>>> detected at [email protected]:5050
>>> I0930 10:13:54.055552 30723 slave.cpp:636] Detecting new master
>>> I0930 10:13:54.928225 30724 slave.cpp:754] Registered with master
>>> [email protected]:5050; given slave ID
>>> 20140930-101303-16777343-5050-30690-0
>>> I0930 10:13:54.928598 30724 slave.cpp:767] Checkpointing SlaveInfo to
>>> '/tmp/mesos/meta/slaves/20140930-101303-16777343-5050-30690-0/slave.info
>>> '
>>> I0930 10:14:17.330390 30725 slave.cpp:1002] Got assigned task 0 for
>>> framework 20140930-101303-16777343-5050-30690-0000
>>> I0930 10:14:17.330557 30725 slave.cpp:1112] Launching task 0 for
>>> framework 20140930-101303-16777343-5050-30690-0000
>>> I0930 10:14:17.331296 30725 slave.cpp:1222] Queuing task '0' for
>>> executor default of framework '20140930-101303-16777343-5050-30690-0000
>>> *I0930 10:14:17.333109 30730 docker.cpp:984] Starting container
>>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' for executor 'default' and framework
>>> '20140930-101303-16777343-5050-30690-0000'*
>>> I0930 10:14:20.062705 30730 slave.cpp:2538] Monitoring executor
>>> 'default' of framework '20140930-101303-16777343-5050-30690-0000' in
>>> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
>>>
>>> The container is running quite happily at this point.
>>>
>>> I0930 10:14:53.061337 30724 slave.cpp:3053] Current usage 0.76%. Max
>>> allowed age: 6.247043850997720days
>>> *I0930 10:15:17.331712 30730 slave.cpp:3010] Terminating executor
>>> default of framework 20140930-101303-16777343-5050-30690-0000 because it
>>> did not register within 1mins*
>>> I0930 10:15:17.332221 30728 docker.cpp:1473] Destroying container
>>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
>>> I0930 10:15:17.332308 30728 docker.cpp:1568] Running docker kill on
>>> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
>>> I0930 10:15:18.109361 30730 docker.cpp:1646] Executor for container
>>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' has exited
>>>
>>>
>>> Thanks,
>>>
>>> Andy.
>>>
>>> --
>>> Andy Grove
>>> VP Engineering
>>> CodeFutures Corporation
>>>
>>>
>>>
>>> On Mon, Sep 29, 2014 at 6:25 PM, Tim Chen <[email protected]> wrote:
>>>
>>>> Hi Andy,
>>>>
>>>> You don't need to specifiy -d as the docker containerizer will set it
>>>> for you since we run all docker images detached.
>>>>
>>>> It seems like the executor just simply can't register with the slave.
>>>> Can you try just running a simple command without Docker that takes longer
>>>> than the executor registration timeout to see if you see the same error?
>>>>
>>>> Also do you run the mesos slave in a docker container as well?
>>>>
>>>> Will be great if you can share the slave log as Vinod suggested too.
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone <[email protected]>
>>>> wrote:
>>>>
>>>>> I'll let Tim Chen help you out here since he has more context. Some
>>>>> slave logs around the failed container launch would be helpful.
>>>>>
>>>>>
>>>>> On Mon, Sep 29, 2014 at 5:03 PM, Andy Grove <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Ignore my comment about docker run not returning. That is incorrect.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Andy.
>>>>>>
>>>>>> --
>>>>>> Andy Grove
>>>>>> VP Engineering
>>>>>> CodeFutures Corporation
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 29, 2014 at 5:59 PM, Andy Grove <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Vinod,
>>>>>>>
>>>>>>> Thanks for the quick response but the image is already on the slave
>>>>>>> and I see the container being launched almost immediately when my 
>>>>>>> framework
>>>>>>> starts (within 1-2 seconds). If I keep running docker ps, this is the 
>>>>>>> last
>>>>>>> output I see before the container is killed:
>>>>>>>
>>>>>>> $ docker ps
>>>>>>> CONTAINER ID        IMAGE                                   COMMAND
>>>>>>>                CREATED             STATUS              PORTS
>>>>>>> NAMES
>>>>>>> 45f992c2781f        codefutures/dbshards_zookeeper:latest   "/bin/sh
>>>>>>> -c '/opt/zo   59 seconds ago      Up 58 seconds
>>>>>>>
>>>>>>> I am using mesos 0.20.1 and docker 1.2.0 on Ubuntu 14.04.
>>>>>>>
>>>>>>> So the container is running fine. It is a long running service i.e.
>>>>>>> the docker run command will never return. Should I be providing some 
>>>>>>> option
>>>>>>> so that the docker executor passed the -d flag to the docker run 
>>>>>>> command? I
>>>>>>> guess I should start looking through the mesos source so I can see how 
>>>>>>> this
>>>>>>> works.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Andy.
>>>>>>>
>>>>>>> --
>>>>>>> Andy Grove
>>>>>>> VP Engineering
>>>>>>> CodeFutures Corporation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Trying increasing the executor registration timeout on the slave
>>>>>>>> (--executor_registration_timeout) to give docker more time to do a 
>>>>>>>> pull of
>>>>>>>> the image.
>>>>>>>>
>>>>>>>> On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've working on a prototype Mesos framework to launch docker
>>>>>>>>> containers. I'm getting as far as seeing my container start up but 
>>>>>>>>> after
>>>>>>>>> one minute if gets killed due to:
>>>>>>>>>
>>>>>>>>> Terminating executor default of framework
>>>>>>>>> 20140929-155916-16777343-5050-2708-0004 because it did not register 
>>>>>>>>> within
>>>>>>>>> 1mins
>>>>>>>>>
>>>>>>>>> Here is the code I am using in my scheduler, which was based on
>>>>>>>>> one of the examples:
>>>>>>>>>
>>>>>>>>>   @Override
>>>>>>>>>   public void resourceOffers(SchedulerDriver schedulerDriver,
>>>>>>>>> List<Protos.Offer> offers) {
>>>>>>>>>     logger.info("resourceOffers() with {} offers", offers.size());
>>>>>>>>>
>>>>>>>>>     for (Protos.Offer offer : offers) {
>>>>>>>>>
>>>>>>>>>       List<Protos.TaskInfo> tasks = new
>>>>>>>>> ArrayList<Protos.TaskInfo>();
>>>>>>>>>       if (launchedTasks < totalTasks) {
>>>>>>>>>         Protos.TaskID taskId = Protos.TaskID.newBuilder()
>>>>>>>>>             .setValue(Integer.toString(launchedTasks++)).build();
>>>>>>>>>
>>>>>>>>>         logger.info("Launching task " + taskId.getValue());
>>>>>>>>>
>>>>>>>>>         // docker image info
>>>>>>>>>         Protos.ContainerInfo.DockerInfo.Builder dockerInfoBuilder
>>>>>>>>> = Protos.ContainerInfo.DockerInfo.newBuilder();
>>>>>>>>>
>>>>>>>>> dockerInfoBuilder.setImage("codefutures/dbshards_zookeeper");
>>>>>>>>>
>>>>>>>>>         // container info
>>>>>>>>>         Protos.ContainerInfo.Builder containerInfoBuilder =
>>>>>>>>> Protos.ContainerInfo.newBuilder();
>>>>>>>>>
>>>>>>>>> containerInfoBuilder.setType(Protos.ContainerInfo.Type.DOCKER);
>>>>>>>>>         containerInfoBuilder.setDocker(dockerInfoBuilder.build());
>>>>>>>>>
>>>>>>>>>         // create executor for the container
>>>>>>>>>         Protos.ExecutorInfo executor =
>>>>>>>>> Protos.ExecutorInfo.newBuilder()
>>>>>>>>>
>>>>>>>>> .setExecutorId(Protos.ExecutorID.newBuilder().setValue("default"))
>>>>>>>>>
>>>>>>>>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false))
>>>>>>>>>             .setContainer(containerInfoBuilder)
>>>>>>>>>             .setName("Test Executor (Docker)")
>>>>>>>>>             .setSource("docker_test")
>>>>>>>>>             .build();
>>>>>>>>>
>>>>>>>>>         // create task to run
>>>>>>>>>         Protos.TaskInfo task = Protos.TaskInfo.newBuilder()
>>>>>>>>>             .setName("task " + taskId.getValue())
>>>>>>>>>             .setTaskId(taskId)
>>>>>>>>>             .setSlaveId(offer.getSlaveId())
>>>>>>>>>             .addResources(Protos.Resource.newBuilder()
>>>>>>>>>                 .setName("cpus")
>>>>>>>>>                 .setType(Protos.Value.Type.SCALAR)
>>>>>>>>>
>>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(1)))
>>>>>>>>>             .addResources(Protos.Resource.newBuilder()
>>>>>>>>>                 .setName("mem")
>>>>>>>>>                 .setType(Protos.Value.Type.SCALAR)
>>>>>>>>>
>>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(128)))
>>>>>>>>>             .setExecutor(Protos.ExecutorInfo.newBuilder(executor))
>>>>>>>>>             .build();
>>>>>>>>>
>>>>>>>>>         tasks.add(task);
>>>>>>>>>       }
>>>>>>>>>       Protos.Filters filters =
>>>>>>>>> Protos.Filters.newBuilder().setRefuseSeconds(1).build();
>>>>>>>>>
>>>>>>>>>       schedulerDriver.launchTasks(offer.getId(), tasks, filters);
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> Am I missing some steps with this approach?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Andy.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Andy Grove
>>>>>>>>> VP Engineering
>>>>>>>>> CodeFutures Corporation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to