from mesospheres.io, it is recommended you set up the timeout as Vinod
mentioned earlier to be 5 mins.


-Luyi.




On Tue, Sep 30, 2014 at 11:02 AM, Andy Grove <[email protected]>
wrote:

> OK. So I figured out the issue with this and it was my misunderstanding of
> executors and tasks.
>
> My task info had:
>
> .setExecutor(Protos.ExecutorInfo.newBuilder(executor))
>
> I should have had this:
>
>             .setContainer(containerInfoBuilder)
>             .setCommand(Protos.CommandInfo.newBuilder().setShell(false))
>
> I didn't have a mesos executor deployed inside my container which explains
> the timeout issue.
>
> Thanks again for the support.
>
>
> Thanks,
>
> Andy.
>
> --
> Andy Grove
> VP Engineering
> CodeFutures Corporation
>
>
>
> On Tue, Sep 30, 2014 at 10:20 AM, Andy Grove <[email protected]>
> wrote:
>
>> Hi Tim,
>>
>> Thanks for helping with this. I am running mesos-master and mesos-slave
>> natively on the same host (my desktop). The only container in use is the
>> one being launched by the mesos-slave.
>>
>> I will try your suggestion of running a simple command next.
>>
>> Here is the output from the slave from this issue though:
>>
>> I0930 10:13:52.053177 30722 main.cpp:126] Build: 2014-09-29 15:35:37 by
>> andy
>> I0930 10:13:52.053228 30722 main.cpp:128] Version: 0.20.1
>> I0930 10:13:53.055480 30722 containerizer.cpp:89] Using isolation:
>> posix/cpu,posix/mem
>> I0930 10:13:53.058353 30722 main.cpp:149] Starting Mesos slave
>> I0930 10:13:53.059651 30722 slave.cpp:167] Slave started on 1)@
>> 127.0.1.1:5051
>> I0930 10:13:53.060072 30722 slave.cpp:278] Slave resources: cpus(*):8;
>> mem(*):14963; disk(*):1.85648e+06; ports(*):[31000-32000]
>> I0930 10:13:53.060226 30722 slave.cpp:306] Slave hostname: davros
>> I0930 10:13:53.060253 30722 slave.cpp:307] Slave checkpoint: true
>> I0930 10:13:53.064975 30729 state.cpp:33] Recovering state from
>> '/tmp/mesos/meta'
>> I0930 10:13:53.065352 30725 status_update_manager.cpp:193] Recovering
>> status update manager
>> I0930 10:13:53.065626 30729 docker.cpp:577] Recovering Docker containers
>> I0930 10:13:53.065690 30724 containerizer.cpp:252] Recovering
>> containerizer
>> I0930 10:13:54.055233 30723 slave.cpp:3198] Finished recovery
>> I0930 10:13:54.055448 30723 slave.cpp:589] New master detected at
>> [email protected]:5050
>> I0930 10:13:54.055532 30723 slave.cpp:625] No credentials provided.
>> Attempting to register without authentication
>> I0930 10:13:54.055537 30730 status_update_manager.cpp:167] New master
>> detected at [email protected]:5050
>> I0930 10:13:54.055552 30723 slave.cpp:636] Detecting new master
>> I0930 10:13:54.928225 30724 slave.cpp:754] Registered with master
>> [email protected]:5050; given slave ID
>> 20140930-101303-16777343-5050-30690-0
>> I0930 10:13:54.928598 30724 slave.cpp:767] Checkpointing SlaveInfo to
>> '/tmp/mesos/meta/slaves/20140930-101303-16777343-5050-30690-0/slave.info'
>> I0930 10:14:17.330390 30725 slave.cpp:1002] Got assigned task 0 for
>> framework 20140930-101303-16777343-5050-30690-0000
>> I0930 10:14:17.330557 30725 slave.cpp:1112] Launching task 0 for
>> framework 20140930-101303-16777343-5050-30690-0000
>> I0930 10:14:17.331296 30725 slave.cpp:1222] Queuing task '0' for executor
>> default of framework '20140930-101303-16777343-5050-30690-0000
>> *I0930 10:14:17.333109 30730 docker.cpp:984] Starting container
>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' for executor 'default' and framework
>> '20140930-101303-16777343-5050-30690-0000'*
>> I0930 10:14:20.062705 30730 slave.cpp:2538] Monitoring executor 'default'
>> of framework '20140930-101303-16777343-5050-30690-0000' in container
>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
>>
>> The container is running quite happily at this point.
>>
>> I0930 10:14:53.061337 30724 slave.cpp:3053] Current usage 0.76%. Max
>> allowed age: 6.247043850997720days
>> *I0930 10:15:17.331712 30730 slave.cpp:3010] Terminating executor default
>> of framework 20140930-101303-16777343-5050-30690-0000 because it did not
>> register within 1mins*
>> I0930 10:15:17.332221 30728 docker.cpp:1473] Destroying container
>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
>> I0930 10:15:17.332308 30728 docker.cpp:1568] Running docker kill on
>> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
>> I0930 10:15:18.109361 30730 docker.cpp:1646] Executor for container
>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' has exited
>>
>>
>> Thanks,
>>
>> Andy.
>>
>> --
>> Andy Grove
>> VP Engineering
>> CodeFutures Corporation
>>
>>
>>
>> On Mon, Sep 29, 2014 at 6:25 PM, Tim Chen <[email protected]> wrote:
>>
>>> Hi Andy,
>>>
>>> You don't need to specifiy -d as the docker containerizer will set it
>>> for you since we run all docker images detached.
>>>
>>> It seems like the executor just simply can't register with the slave.
>>> Can you try just running a simple command without Docker that takes longer
>>> than the executor registration timeout to see if you see the same error?
>>>
>>> Also do you run the mesos slave in a docker container as well?
>>>
>>> Will be great if you can share the slave log as Vinod suggested too.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone <[email protected]> wrote:
>>>
>>>> I'll let Tim Chen help you out here since he has more context. Some
>>>> slave logs around the failed container launch would be helpful.
>>>>
>>>>
>>>> On Mon, Sep 29, 2014 at 5:03 PM, Andy Grove <[email protected]
>>>> > wrote:
>>>>
>>>>> Ignore my comment about docker run not returning. That is incorrect.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Andy.
>>>>>
>>>>> --
>>>>> Andy Grove
>>>>> VP Engineering
>>>>> CodeFutures Corporation
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 29, 2014 at 5:59 PM, Andy Grove <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Vinod,
>>>>>>
>>>>>> Thanks for the quick response but the image is already on the slave
>>>>>> and I see the container being launched almost immediately when my 
>>>>>> framework
>>>>>> starts (within 1-2 seconds). If I keep running docker ps, this is the 
>>>>>> last
>>>>>> output I see before the container is killed:
>>>>>>
>>>>>> $ docker ps
>>>>>> CONTAINER ID        IMAGE                                   COMMAND
>>>>>>              CREATED             STATUS              PORTS
>>>>>> NAMES
>>>>>> 45f992c2781f        codefutures/dbshards_zookeeper:latest   "/bin/sh
>>>>>> -c '/opt/zo   59 seconds ago      Up 58 seconds
>>>>>>
>>>>>> I am using mesos 0.20.1 and docker 1.2.0 on Ubuntu 14.04.
>>>>>>
>>>>>> So the container is running fine. It is a long running service i.e.
>>>>>> the docker run command will never return. Should I be providing some 
>>>>>> option
>>>>>> so that the docker executor passed the -d flag to the docker run 
>>>>>> command? I
>>>>>> guess I should start looking through the mesos source so I can see how 
>>>>>> this
>>>>>> works.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Andy.
>>>>>>
>>>>>> --
>>>>>> Andy Grove
>>>>>> VP Engineering
>>>>>> CodeFutures Corporation
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Trying increasing the executor registration timeout on the slave
>>>>>>> (--executor_registration_timeout) to give docker more time to do a pull 
>>>>>>> of
>>>>>>> the image.
>>>>>>>
>>>>>>> On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've working on a prototype Mesos framework to launch docker
>>>>>>>> containers. I'm getting as far as seeing my container start up but 
>>>>>>>> after
>>>>>>>> one minute if gets killed due to:
>>>>>>>>
>>>>>>>> Terminating executor default of framework
>>>>>>>> 20140929-155916-16777343-5050-2708-0004 because it did not register 
>>>>>>>> within
>>>>>>>> 1mins
>>>>>>>>
>>>>>>>> Here is the code I am using in my scheduler, which was based on one
>>>>>>>> of the examples:
>>>>>>>>
>>>>>>>>   @Override
>>>>>>>>   public void resourceOffers(SchedulerDriver schedulerDriver,
>>>>>>>> List<Protos.Offer> offers) {
>>>>>>>>     logger.info("resourceOffers() with {} offers", offers.size());
>>>>>>>>
>>>>>>>>     for (Protos.Offer offer : offers) {
>>>>>>>>
>>>>>>>>       List<Protos.TaskInfo> tasks = new
>>>>>>>> ArrayList<Protos.TaskInfo>();
>>>>>>>>       if (launchedTasks < totalTasks) {
>>>>>>>>         Protos.TaskID taskId = Protos.TaskID.newBuilder()
>>>>>>>>             .setValue(Integer.toString(launchedTasks++)).build();
>>>>>>>>
>>>>>>>>         logger.info("Launching task " + taskId.getValue());
>>>>>>>>
>>>>>>>>         // docker image info
>>>>>>>>         Protos.ContainerInfo.DockerInfo.Builder dockerInfoBuilder =
>>>>>>>> Protos.ContainerInfo.DockerInfo.newBuilder();
>>>>>>>>
>>>>>>>> dockerInfoBuilder.setImage("codefutures/dbshards_zookeeper");
>>>>>>>>
>>>>>>>>         // container info
>>>>>>>>         Protos.ContainerInfo.Builder containerInfoBuilder =
>>>>>>>> Protos.ContainerInfo.newBuilder();
>>>>>>>>
>>>>>>>> containerInfoBuilder.setType(Protos.ContainerInfo.Type.DOCKER);
>>>>>>>>         containerInfoBuilder.setDocker(dockerInfoBuilder.build());
>>>>>>>>
>>>>>>>>         // create executor for the container
>>>>>>>>         Protos.ExecutorInfo executor =
>>>>>>>> Protos.ExecutorInfo.newBuilder()
>>>>>>>>
>>>>>>>> .setExecutorId(Protos.ExecutorID.newBuilder().setValue("default"))
>>>>>>>>
>>>>>>>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false))
>>>>>>>>             .setContainer(containerInfoBuilder)
>>>>>>>>             .setName("Test Executor (Docker)")
>>>>>>>>             .setSource("docker_test")
>>>>>>>>             .build();
>>>>>>>>
>>>>>>>>         // create task to run
>>>>>>>>         Protos.TaskInfo task = Protos.TaskInfo.newBuilder()
>>>>>>>>             .setName("task " + taskId.getValue())
>>>>>>>>             .setTaskId(taskId)
>>>>>>>>             .setSlaveId(offer.getSlaveId())
>>>>>>>>             .addResources(Protos.Resource.newBuilder()
>>>>>>>>                 .setName("cpus")
>>>>>>>>                 .setType(Protos.Value.Type.SCALAR)
>>>>>>>>
>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(1)))
>>>>>>>>             .addResources(Protos.Resource.newBuilder()
>>>>>>>>                 .setName("mem")
>>>>>>>>                 .setType(Protos.Value.Type.SCALAR)
>>>>>>>>
>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(128)))
>>>>>>>>             .setExecutor(Protos.ExecutorInfo.newBuilder(executor))
>>>>>>>>             .build();
>>>>>>>>
>>>>>>>>         tasks.add(task);
>>>>>>>>       }
>>>>>>>>       Protos.Filters filters =
>>>>>>>> Protos.Filters.newBuilder().setRefuseSeconds(1).build();
>>>>>>>>
>>>>>>>>       schedulerDriver.launchTasks(offer.getId(), tasks, filters);
>>>>>>>>     }
>>>>>>>>
>>>>>>>>   }
>>>>>>>>
>>>>>>>> Am I missing some steps with this approach?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Andy.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Andy Grove
>>>>>>>> VP Engineering
>>>>>>>> CodeFutures Corporation
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to