Hi Vinod, The documentation actually already mentions this, that if a ExecutorInfo is set in the TaskInfo then it is expected to be a Mesos Executor and it is expected to be registering with the slave.
Tim On Tue, Sep 30, 2014 at 11:42 AM, Vinod Kone <[email protected]> wrote: > Tim, mind updating the documentation > <http://mesos.apache.org/documentation/latest/docker-containerizer/> to > make sure others don't fall into the same trap? > > On Tue, Sep 30, 2014 at 11:38 AM, Tim Chen <[email protected]> wrote: > >> Hi Andy, >> >> Good catch, I also missed that as I was just looking at the Docker >> configurations. >> >> You'll set the Executor when you have an custom executor. >> >> Let us know if you have any other problems. >> >> Tim >> >> On Tue, Sep 30, 2014 at 11:02 AM, Andy Grove <[email protected]> >> wrote: >> >>> OK. So I figured out the issue with this and it was my misunderstanding >>> of executors and tasks. >>> >>> My task info had: >>> >>> .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) >>> >>> I should have had this: >>> >>> .setContainer(containerInfoBuilder) >>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) >>> >>> I didn't have a mesos executor deployed inside my container which >>> explains the timeout issue. >>> >>> Thanks again for the support. >>> >>> >>> Thanks, >>> >>> Andy. >>> >>> -- >>> Andy Grove >>> VP Engineering >>> CodeFutures Corporation >>> >>> >>> >>> On Tue, Sep 30, 2014 at 10:20 AM, Andy Grove <[email protected] >>> > wrote: >>> >>>> Hi Tim, >>>> >>>> Thanks for helping with this. I am running mesos-master and mesos-slave >>>> natively on the same host (my desktop). The only container in use is the >>>> one being launched by the mesos-slave. >>>> >>>> I will try your suggestion of running a simple command next. >>>> >>>> Here is the output from the slave from this issue though: >>>> >>>> I0930 10:13:52.053177 30722 main.cpp:126] Build: 2014-09-29 15:35:37 by >>>> andy >>>> I0930 10:13:52.053228 30722 main.cpp:128] Version: 0.20.1 >>>> I0930 10:13:53.055480 30722 containerizer.cpp:89] Using isolation: >>>> posix/cpu,posix/mem >>>> I0930 10:13:53.058353 30722 main.cpp:149] Starting Mesos slave >>>> I0930 10:13:53.059651 30722 slave.cpp:167] Slave started on 1)@ >>>> 127.0.1.1:5051 >>>> I0930 10:13:53.060072 30722 slave.cpp:278] Slave resources: cpus(*):8; >>>> mem(*):14963; disk(*):1.85648e+06; ports(*):[31000-32000] >>>> I0930 10:13:53.060226 30722 slave.cpp:306] Slave hostname: davros >>>> I0930 10:13:53.060253 30722 slave.cpp:307] Slave checkpoint: true >>>> I0930 10:13:53.064975 30729 state.cpp:33] Recovering state from >>>> '/tmp/mesos/meta' >>>> I0930 10:13:53.065352 30725 status_update_manager.cpp:193] Recovering >>>> status update manager >>>> I0930 10:13:53.065626 30729 docker.cpp:577] Recovering Docker containers >>>> I0930 10:13:53.065690 30724 containerizer.cpp:252] Recovering >>>> containerizer >>>> I0930 10:13:54.055233 30723 slave.cpp:3198] Finished recovery >>>> I0930 10:13:54.055448 30723 slave.cpp:589] New master detected at >>>> [email protected]:5050 >>>> I0930 10:13:54.055532 30723 slave.cpp:625] No credentials provided. >>>> Attempting to register without authentication >>>> I0930 10:13:54.055537 30730 status_update_manager.cpp:167] New master >>>> detected at [email protected]:5050 >>>> I0930 10:13:54.055552 30723 slave.cpp:636] Detecting new master >>>> I0930 10:13:54.928225 30724 slave.cpp:754] Registered with master >>>> [email protected]:5050; given slave ID >>>> 20140930-101303-16777343-5050-30690-0 >>>> I0930 10:13:54.928598 30724 slave.cpp:767] Checkpointing SlaveInfo to >>>> '/tmp/mesos/meta/slaves/20140930-101303-16777343-5050-30690-0/ >>>> slave.info' >>>> I0930 10:14:17.330390 30725 slave.cpp:1002] Got assigned task 0 for >>>> framework 20140930-101303-16777343-5050-30690-0000 >>>> I0930 10:14:17.330557 30725 slave.cpp:1112] Launching task 0 for >>>> framework 20140930-101303-16777343-5050-30690-0000 >>>> I0930 10:14:17.331296 30725 slave.cpp:1222] Queuing task '0' for >>>> executor default of framework '20140930-101303-16777343-5050-30690-0000 >>>> *I0930 10:14:17.333109 30730 docker.cpp:984] Starting container >>>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' for executor 'default' and framework >>>> '20140930-101303-16777343-5050-30690-0000'* >>>> I0930 10:14:20.062705 30730 slave.cpp:2538] Monitoring executor >>>> 'default' of framework '20140930-101303-16777343-5050-30690-0000' in >>>> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >>>> >>>> The container is running quite happily at this point. >>>> >>>> I0930 10:14:53.061337 30724 slave.cpp:3053] Current usage 0.76%. Max >>>> allowed age: 6.247043850997720days >>>> *I0930 10:15:17.331712 30730 slave.cpp:3010] Terminating executor >>>> default of framework 20140930-101303-16777343-5050-30690-0000 because it >>>> did not register within 1mins* >>>> I0930 10:15:17.332221 30728 docker.cpp:1473] Destroying container >>>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >>>> I0930 10:15:17.332308 30728 docker.cpp:1568] Running docker kill on >>>> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >>>> I0930 10:15:18.109361 30730 docker.cpp:1646] Executor for container >>>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' has exited >>>> >>>> >>>> Thanks, >>>> >>>> Andy. >>>> >>>> -- >>>> Andy Grove >>>> VP Engineering >>>> CodeFutures Corporation >>>> >>>> >>>> >>>> On Mon, Sep 29, 2014 at 6:25 PM, Tim Chen <[email protected]> wrote: >>>> >>>>> Hi Andy, >>>>> >>>>> You don't need to specifiy -d as the docker containerizer will set it >>>>> for you since we run all docker images detached. >>>>> >>>>> It seems like the executor just simply can't register with the slave. >>>>> Can you try just running a simple command without Docker that takes longer >>>>> than the executor registration timeout to see if you see the same error? >>>>> >>>>> Also do you run the mesos slave in a docker container as well? >>>>> >>>>> Will be great if you can share the slave log as Vinod suggested too. >>>>> >>>>> Tim >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone <[email protected]> >>>>> wrote: >>>>> >>>>>> I'll let Tim Chen help you out here since he has more context. Some >>>>>> slave logs around the failed container launch would be helpful. >>>>>> >>>>>> >>>>>> On Mon, Sep 29, 2014 at 5:03 PM, Andy Grove < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Ignore my comment about docker run not returning. That is incorrect. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Andy. >>>>>>> >>>>>>> -- >>>>>>> Andy Grove >>>>>>> VP Engineering >>>>>>> CodeFutures Corporation >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Sep 29, 2014 at 5:59 PM, Andy Grove < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Vinod, >>>>>>>> >>>>>>>> Thanks for the quick response but the image is already on the slave >>>>>>>> and I see the container being launched almost immediately when my >>>>>>>> framework >>>>>>>> starts (within 1-2 seconds). If I keep running docker ps, this is the >>>>>>>> last >>>>>>>> output I see before the container is killed: >>>>>>>> >>>>>>>> $ docker ps >>>>>>>> CONTAINER ID IMAGE COMMAND >>>>>>>> CREATED STATUS PORTS >>>>>>>> NAMES >>>>>>>> 45f992c2781f codefutures/dbshards_zookeeper:latest >>>>>>>> "/bin/sh -c '/opt/zo 59 seconds ago Up 58 seconds >>>>>>>> >>>>>>>> I am using mesos 0.20.1 and docker 1.2.0 on Ubuntu 14.04. >>>>>>>> >>>>>>>> So the container is running fine. It is a long running service i.e. >>>>>>>> the docker run command will never return. Should I be providing some >>>>>>>> option >>>>>>>> so that the docker executor passed the -d flag to the docker run >>>>>>>> command? I >>>>>>>> guess I should start looking through the mesos source so I can see how >>>>>>>> this >>>>>>>> works. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Andy. >>>>>>>> >>>>>>>> -- >>>>>>>> Andy Grove >>>>>>>> VP Engineering >>>>>>>> CodeFutures Corporation >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Trying increasing the executor registration timeout on the slave >>>>>>>>> (--executor_registration_timeout) to give docker more time to do a >>>>>>>>> pull of >>>>>>>>> the image. >>>>>>>>> >>>>>>>>> On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I've working on a prototype Mesos framework to launch docker >>>>>>>>>> containers. I'm getting as far as seeing my container start up but >>>>>>>>>> after >>>>>>>>>> one minute if gets killed due to: >>>>>>>>>> >>>>>>>>>> Terminating executor default of framework >>>>>>>>>> 20140929-155916-16777343-5050-2708-0004 because it did not register >>>>>>>>>> within >>>>>>>>>> 1mins >>>>>>>>>> >>>>>>>>>> Here is the code I am using in my scheduler, which was based on >>>>>>>>>> one of the examples: >>>>>>>>>> >>>>>>>>>> @Override >>>>>>>>>> public void resourceOffers(SchedulerDriver schedulerDriver, >>>>>>>>>> List<Protos.Offer> offers) { >>>>>>>>>> logger.info("resourceOffers() with {} offers", >>>>>>>>>> offers.size()); >>>>>>>>>> >>>>>>>>>> for (Protos.Offer offer : offers) { >>>>>>>>>> >>>>>>>>>> List<Protos.TaskInfo> tasks = new >>>>>>>>>> ArrayList<Protos.TaskInfo>(); >>>>>>>>>> if (launchedTasks < totalTasks) { >>>>>>>>>> Protos.TaskID taskId = Protos.TaskID.newBuilder() >>>>>>>>>> .setValue(Integer.toString(launchedTasks++)).build(); >>>>>>>>>> >>>>>>>>>> logger.info("Launching task " + taskId.getValue()); >>>>>>>>>> >>>>>>>>>> // docker image info >>>>>>>>>> Protos.ContainerInfo.DockerInfo.Builder dockerInfoBuilder >>>>>>>>>> = Protos.ContainerInfo.DockerInfo.newBuilder(); >>>>>>>>>> >>>>>>>>>> dockerInfoBuilder.setImage("codefutures/dbshards_zookeeper"); >>>>>>>>>> >>>>>>>>>> // container info >>>>>>>>>> Protos.ContainerInfo.Builder containerInfoBuilder = >>>>>>>>>> Protos.ContainerInfo.newBuilder(); >>>>>>>>>> >>>>>>>>>> containerInfoBuilder.setType(Protos.ContainerInfo.Type.DOCKER); >>>>>>>>>> containerInfoBuilder.setDocker(dockerInfoBuilder.build()); >>>>>>>>>> >>>>>>>>>> // create executor for the container >>>>>>>>>> Protos.ExecutorInfo executor = >>>>>>>>>> Protos.ExecutorInfo.newBuilder() >>>>>>>>>> >>>>>>>>>> .setExecutorId(Protos.ExecutorID.newBuilder().setValue("default")) >>>>>>>>>> >>>>>>>>>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) >>>>>>>>>> .setContainer(containerInfoBuilder) >>>>>>>>>> .setName("Test Executor (Docker)") >>>>>>>>>> .setSource("docker_test") >>>>>>>>>> .build(); >>>>>>>>>> >>>>>>>>>> // create task to run >>>>>>>>>> Protos.TaskInfo task = Protos.TaskInfo.newBuilder() >>>>>>>>>> .setName("task " + taskId.getValue()) >>>>>>>>>> .setTaskId(taskId) >>>>>>>>>> .setSlaveId(offer.getSlaveId()) >>>>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>>>> .setName("cpus") >>>>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>>>> >>>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(1))) >>>>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>>>> .setName("mem") >>>>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>>>> >>>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(128))) >>>>>>>>>> .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) >>>>>>>>>> .build(); >>>>>>>>>> >>>>>>>>>> tasks.add(task); >>>>>>>>>> } >>>>>>>>>> Protos.Filters filters = >>>>>>>>>> Protos.Filters.newBuilder().setRefuseSeconds(1).build(); >>>>>>>>>> >>>>>>>>>> schedulerDriver.launchTasks(offer.getId(), tasks, filters); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Am I missing some steps with this approach? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Andy. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Andy Grove >>>>>>>>>> VP Engineering >>>>>>>>>> CodeFutures Corporation >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

