Hi Andy, Good catch, I also missed that as I was just looking at the Docker configurations.
You'll set the Executor when you have an custom executor. Let us know if you have any other problems. Tim On Tue, Sep 30, 2014 at 11:02 AM, Andy Grove <[email protected]> wrote: > OK. So I figured out the issue with this and it was my misunderstanding of > executors and tasks. > > My task info had: > > .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) > > I should have had this: > > .setContainer(containerInfoBuilder) > .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) > > I didn't have a mesos executor deployed inside my container which explains > the timeout issue. > > Thanks again for the support. > > > Thanks, > > Andy. > > -- > Andy Grove > VP Engineering > CodeFutures Corporation > > > > On Tue, Sep 30, 2014 at 10:20 AM, Andy Grove <[email protected]> > wrote: > >> Hi Tim, >> >> Thanks for helping with this. I am running mesos-master and mesos-slave >> natively on the same host (my desktop). The only container in use is the >> one being launched by the mesos-slave. >> >> I will try your suggestion of running a simple command next. >> >> Here is the output from the slave from this issue though: >> >> I0930 10:13:52.053177 30722 main.cpp:126] Build: 2014-09-29 15:35:37 by >> andy >> I0930 10:13:52.053228 30722 main.cpp:128] Version: 0.20.1 >> I0930 10:13:53.055480 30722 containerizer.cpp:89] Using isolation: >> posix/cpu,posix/mem >> I0930 10:13:53.058353 30722 main.cpp:149] Starting Mesos slave >> I0930 10:13:53.059651 30722 slave.cpp:167] Slave started on 1)@ >> 127.0.1.1:5051 >> I0930 10:13:53.060072 30722 slave.cpp:278] Slave resources: cpus(*):8; >> mem(*):14963; disk(*):1.85648e+06; ports(*):[31000-32000] >> I0930 10:13:53.060226 30722 slave.cpp:306] Slave hostname: davros >> I0930 10:13:53.060253 30722 slave.cpp:307] Slave checkpoint: true >> I0930 10:13:53.064975 30729 state.cpp:33] Recovering state from >> '/tmp/mesos/meta' >> I0930 10:13:53.065352 30725 status_update_manager.cpp:193] Recovering >> status update manager >> I0930 10:13:53.065626 30729 docker.cpp:577] Recovering Docker containers >> I0930 10:13:53.065690 30724 containerizer.cpp:252] Recovering >> containerizer >> I0930 10:13:54.055233 30723 slave.cpp:3198] Finished recovery >> I0930 10:13:54.055448 30723 slave.cpp:589] New master detected at >> [email protected]:5050 >> I0930 10:13:54.055532 30723 slave.cpp:625] No credentials provided. >> Attempting to register without authentication >> I0930 10:13:54.055537 30730 status_update_manager.cpp:167] New master >> detected at [email protected]:5050 >> I0930 10:13:54.055552 30723 slave.cpp:636] Detecting new master >> I0930 10:13:54.928225 30724 slave.cpp:754] Registered with master >> [email protected]:5050; given slave ID >> 20140930-101303-16777343-5050-30690-0 >> I0930 10:13:54.928598 30724 slave.cpp:767] Checkpointing SlaveInfo to >> '/tmp/mesos/meta/slaves/20140930-101303-16777343-5050-30690-0/slave.info' >> I0930 10:14:17.330390 30725 slave.cpp:1002] Got assigned task 0 for >> framework 20140930-101303-16777343-5050-30690-0000 >> I0930 10:14:17.330557 30725 slave.cpp:1112] Launching task 0 for >> framework 20140930-101303-16777343-5050-30690-0000 >> I0930 10:14:17.331296 30725 slave.cpp:1222] Queuing task '0' for executor >> default of framework '20140930-101303-16777343-5050-30690-0000 >> *I0930 10:14:17.333109 30730 docker.cpp:984] Starting container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' for executor 'default' and framework >> '20140930-101303-16777343-5050-30690-0000'* >> I0930 10:14:20.062705 30730 slave.cpp:2538] Monitoring executor 'default' >> of framework '20140930-101303-16777343-5050-30690-0000' in container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >> >> The container is running quite happily at this point. >> >> I0930 10:14:53.061337 30724 slave.cpp:3053] Current usage 0.76%. Max >> allowed age: 6.247043850997720days >> *I0930 10:15:17.331712 30730 slave.cpp:3010] Terminating executor default >> of framework 20140930-101303-16777343-5050-30690-0000 because it did not >> register within 1mins* >> I0930 10:15:17.332221 30728 docker.cpp:1473] Destroying container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >> I0930 10:15:17.332308 30728 docker.cpp:1568] Running docker kill on >> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >> I0930 10:15:18.109361 30730 docker.cpp:1646] Executor for container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' has exited >> >> >> Thanks, >> >> Andy. >> >> -- >> Andy Grove >> VP Engineering >> CodeFutures Corporation >> >> >> >> On Mon, Sep 29, 2014 at 6:25 PM, Tim Chen <[email protected]> wrote: >> >>> Hi Andy, >>> >>> You don't need to specifiy -d as the docker containerizer will set it >>> for you since we run all docker images detached. >>> >>> It seems like the executor just simply can't register with the slave. >>> Can you try just running a simple command without Docker that takes longer >>> than the executor registration timeout to see if you see the same error? >>> >>> Also do you run the mesos slave in a docker container as well? >>> >>> Will be great if you can share the slave log as Vinod suggested too. >>> >>> Tim >>> >>> >>> >>> >>> >>> >>> On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone <[email protected]> wrote: >>> >>>> I'll let Tim Chen help you out here since he has more context. Some >>>> slave logs around the failed container launch would be helpful. >>>> >>>> >>>> On Mon, Sep 29, 2014 at 5:03 PM, Andy Grove <[email protected] >>>> > wrote: >>>> >>>>> Ignore my comment about docker run not returning. That is incorrect. >>>>> >>>>> Thanks, >>>>> >>>>> Andy. >>>>> >>>>> -- >>>>> Andy Grove >>>>> VP Engineering >>>>> CodeFutures Corporation >>>>> >>>>> >>>>> >>>>> On Mon, Sep 29, 2014 at 5:59 PM, Andy Grove < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Vinod, >>>>>> >>>>>> Thanks for the quick response but the image is already on the slave >>>>>> and I see the container being launched almost immediately when my >>>>>> framework >>>>>> starts (within 1-2 seconds). If I keep running docker ps, this is the >>>>>> last >>>>>> output I see before the container is killed: >>>>>> >>>>>> $ docker ps >>>>>> CONTAINER ID IMAGE COMMAND >>>>>> CREATED STATUS PORTS >>>>>> NAMES >>>>>> 45f992c2781f codefutures/dbshards_zookeeper:latest "/bin/sh >>>>>> -c '/opt/zo 59 seconds ago Up 58 seconds >>>>>> >>>>>> I am using mesos 0.20.1 and docker 1.2.0 on Ubuntu 14.04. >>>>>> >>>>>> So the container is running fine. It is a long running service i.e. >>>>>> the docker run command will never return. Should I be providing some >>>>>> option >>>>>> so that the docker executor passed the -d flag to the docker run >>>>>> command? I >>>>>> guess I should start looking through the mesos source so I can see how >>>>>> this >>>>>> works. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Andy. >>>>>> >>>>>> -- >>>>>> Andy Grove >>>>>> VP Engineering >>>>>> CodeFutures Corporation >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Trying increasing the executor registration timeout on the slave >>>>>>> (--executor_registration_timeout) to give docker more time to do a pull >>>>>>> of >>>>>>> the image. >>>>>>> >>>>>>> On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've working on a prototype Mesos framework to launch docker >>>>>>>> containers. I'm getting as far as seeing my container start up but >>>>>>>> after >>>>>>>> one minute if gets killed due to: >>>>>>>> >>>>>>>> Terminating executor default of framework >>>>>>>> 20140929-155916-16777343-5050-2708-0004 because it did not register >>>>>>>> within >>>>>>>> 1mins >>>>>>>> >>>>>>>> Here is the code I am using in my scheduler, which was based on one >>>>>>>> of the examples: >>>>>>>> >>>>>>>> @Override >>>>>>>> public void resourceOffers(SchedulerDriver schedulerDriver, >>>>>>>> List<Protos.Offer> offers) { >>>>>>>> logger.info("resourceOffers() with {} offers", offers.size()); >>>>>>>> >>>>>>>> for (Protos.Offer offer : offers) { >>>>>>>> >>>>>>>> List<Protos.TaskInfo> tasks = new >>>>>>>> ArrayList<Protos.TaskInfo>(); >>>>>>>> if (launchedTasks < totalTasks) { >>>>>>>> Protos.TaskID taskId = Protos.TaskID.newBuilder() >>>>>>>> .setValue(Integer.toString(launchedTasks++)).build(); >>>>>>>> >>>>>>>> logger.info("Launching task " + taskId.getValue()); >>>>>>>> >>>>>>>> // docker image info >>>>>>>> Protos.ContainerInfo.DockerInfo.Builder dockerInfoBuilder = >>>>>>>> Protos.ContainerInfo.DockerInfo.newBuilder(); >>>>>>>> >>>>>>>> dockerInfoBuilder.setImage("codefutures/dbshards_zookeeper"); >>>>>>>> >>>>>>>> // container info >>>>>>>> Protos.ContainerInfo.Builder containerInfoBuilder = >>>>>>>> Protos.ContainerInfo.newBuilder(); >>>>>>>> >>>>>>>> containerInfoBuilder.setType(Protos.ContainerInfo.Type.DOCKER); >>>>>>>> containerInfoBuilder.setDocker(dockerInfoBuilder.build()); >>>>>>>> >>>>>>>> // create executor for the container >>>>>>>> Protos.ExecutorInfo executor = >>>>>>>> Protos.ExecutorInfo.newBuilder() >>>>>>>> >>>>>>>> .setExecutorId(Protos.ExecutorID.newBuilder().setValue("default")) >>>>>>>> >>>>>>>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) >>>>>>>> .setContainer(containerInfoBuilder) >>>>>>>> .setName("Test Executor (Docker)") >>>>>>>> .setSource("docker_test") >>>>>>>> .build(); >>>>>>>> >>>>>>>> // create task to run >>>>>>>> Protos.TaskInfo task = Protos.TaskInfo.newBuilder() >>>>>>>> .setName("task " + taskId.getValue()) >>>>>>>> .setTaskId(taskId) >>>>>>>> .setSlaveId(offer.getSlaveId()) >>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>> .setName("cpus") >>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>> >>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(1))) >>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>> .setName("mem") >>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>> >>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(128))) >>>>>>>> .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) >>>>>>>> .build(); >>>>>>>> >>>>>>>> tasks.add(task); >>>>>>>> } >>>>>>>> Protos.Filters filters = >>>>>>>> Protos.Filters.newBuilder().setRefuseSeconds(1).build(); >>>>>>>> >>>>>>>> schedulerDriver.launchTasks(offer.getId(), tasks, filters); >>>>>>>> } >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Am I missing some steps with this approach? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Andy. >>>>>>>> >>>>>>>> -- >>>>>>>> Andy Grove >>>>>>>> VP Engineering >>>>>>>> CodeFutures Corporation >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

