Tim, mind updating the documentation <http://mesos.apache.org/documentation/latest/docker-containerizer/> to make sure others don't fall into the same trap?
On Tue, Sep 30, 2014 at 11:38 AM, Tim Chen <[email protected]> wrote: > Hi Andy, > > Good catch, I also missed that as I was just looking at the Docker > configurations. > > You'll set the Executor when you have an custom executor. > > Let us know if you have any other problems. > > Tim > > On Tue, Sep 30, 2014 at 11:02 AM, Andy Grove <[email protected]> > wrote: > >> OK. So I figured out the issue with this and it was my misunderstanding >> of executors and tasks. >> >> My task info had: >> >> .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) >> >> I should have had this: >> >> .setContainer(containerInfoBuilder) >> .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) >> >> I didn't have a mesos executor deployed inside my container which >> explains the timeout issue. >> >> Thanks again for the support. >> >> >> Thanks, >> >> Andy. >> >> -- >> Andy Grove >> VP Engineering >> CodeFutures Corporation >> >> >> >> On Tue, Sep 30, 2014 at 10:20 AM, Andy Grove <[email protected]> >> wrote: >> >>> Hi Tim, >>> >>> Thanks for helping with this. I am running mesos-master and mesos-slave >>> natively on the same host (my desktop). The only container in use is the >>> one being launched by the mesos-slave. >>> >>> I will try your suggestion of running a simple command next. >>> >>> Here is the output from the slave from this issue though: >>> >>> I0930 10:13:52.053177 30722 main.cpp:126] Build: 2014-09-29 15:35:37 by >>> andy >>> I0930 10:13:52.053228 30722 main.cpp:128] Version: 0.20.1 >>> I0930 10:13:53.055480 30722 containerizer.cpp:89] Using isolation: >>> posix/cpu,posix/mem >>> I0930 10:13:53.058353 30722 main.cpp:149] Starting Mesos slave >>> I0930 10:13:53.059651 30722 slave.cpp:167] Slave started on 1)@ >>> 127.0.1.1:5051 >>> I0930 10:13:53.060072 30722 slave.cpp:278] Slave resources: cpus(*):8; >>> mem(*):14963; disk(*):1.85648e+06; ports(*):[31000-32000] >>> I0930 10:13:53.060226 30722 slave.cpp:306] Slave hostname: davros >>> I0930 10:13:53.060253 30722 slave.cpp:307] Slave checkpoint: true >>> I0930 10:13:53.064975 30729 state.cpp:33] Recovering state from >>> '/tmp/mesos/meta' >>> I0930 10:13:53.065352 30725 status_update_manager.cpp:193] Recovering >>> status update manager >>> I0930 10:13:53.065626 30729 docker.cpp:577] Recovering Docker containers >>> I0930 10:13:53.065690 30724 containerizer.cpp:252] Recovering >>> containerizer >>> I0930 10:13:54.055233 30723 slave.cpp:3198] Finished recovery >>> I0930 10:13:54.055448 30723 slave.cpp:589] New master detected at >>> [email protected]:5050 >>> I0930 10:13:54.055532 30723 slave.cpp:625] No credentials provided. >>> Attempting to register without authentication >>> I0930 10:13:54.055537 30730 status_update_manager.cpp:167] New master >>> detected at [email protected]:5050 >>> I0930 10:13:54.055552 30723 slave.cpp:636] Detecting new master >>> I0930 10:13:54.928225 30724 slave.cpp:754] Registered with master >>> [email protected]:5050; given slave ID >>> 20140930-101303-16777343-5050-30690-0 >>> I0930 10:13:54.928598 30724 slave.cpp:767] Checkpointing SlaveInfo to >>> '/tmp/mesos/meta/slaves/20140930-101303-16777343-5050-30690-0/slave.info >>> ' >>> I0930 10:14:17.330390 30725 slave.cpp:1002] Got assigned task 0 for >>> framework 20140930-101303-16777343-5050-30690-0000 >>> I0930 10:14:17.330557 30725 slave.cpp:1112] Launching task 0 for >>> framework 20140930-101303-16777343-5050-30690-0000 >>> I0930 10:14:17.331296 30725 slave.cpp:1222] Queuing task '0' for >>> executor default of framework '20140930-101303-16777343-5050-30690-0000 >>> *I0930 10:14:17.333109 30730 docker.cpp:984] Starting container >>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' for executor 'default' and framework >>> '20140930-101303-16777343-5050-30690-0000'* >>> I0930 10:14:20.062705 30730 slave.cpp:2538] Monitoring executor >>> 'default' of framework '20140930-101303-16777343-5050-30690-0000' in >>> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >>> >>> The container is running quite happily at this point. >>> >>> I0930 10:14:53.061337 30724 slave.cpp:3053] Current usage 0.76%. Max >>> allowed age: 6.247043850997720days >>> *I0930 10:15:17.331712 30730 slave.cpp:3010] Terminating executor >>> default of framework 20140930-101303-16777343-5050-30690-0000 because it >>> did not register within 1mins* >>> I0930 10:15:17.332221 30728 docker.cpp:1473] Destroying container >>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >>> I0930 10:15:17.332308 30728 docker.cpp:1568] Running docker kill on >>> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >>> I0930 10:15:18.109361 30730 docker.cpp:1646] Executor for container >>> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' has exited >>> >>> >>> Thanks, >>> >>> Andy. >>> >>> -- >>> Andy Grove >>> VP Engineering >>> CodeFutures Corporation >>> >>> >>> >>> On Mon, Sep 29, 2014 at 6:25 PM, Tim Chen <[email protected]> wrote: >>> >>>> Hi Andy, >>>> >>>> You don't need to specifiy -d as the docker containerizer will set it >>>> for you since we run all docker images detached. >>>> >>>> It seems like the executor just simply can't register with the slave. >>>> Can you try just running a simple command without Docker that takes longer >>>> than the executor registration timeout to see if you see the same error? >>>> >>>> Also do you run the mesos slave in a docker container as well? >>>> >>>> Will be great if you can share the slave log as Vinod suggested too. >>>> >>>> Tim >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone <[email protected]> >>>> wrote: >>>> >>>>> I'll let Tim Chen help you out here since he has more context. Some >>>>> slave logs around the failed container launch would be helpful. >>>>> >>>>> >>>>> On Mon, Sep 29, 2014 at 5:03 PM, Andy Grove < >>>>> [email protected]> wrote: >>>>> >>>>>> Ignore my comment about docker run not returning. That is incorrect. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Andy. >>>>>> >>>>>> -- >>>>>> Andy Grove >>>>>> VP Engineering >>>>>> CodeFutures Corporation >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Sep 29, 2014 at 5:59 PM, Andy Grove < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Vinod, >>>>>>> >>>>>>> Thanks for the quick response but the image is already on the slave >>>>>>> and I see the container being launched almost immediately when my >>>>>>> framework >>>>>>> starts (within 1-2 seconds). If I keep running docker ps, this is the >>>>>>> last >>>>>>> output I see before the container is killed: >>>>>>> >>>>>>> $ docker ps >>>>>>> CONTAINER ID IMAGE COMMAND >>>>>>> CREATED STATUS PORTS >>>>>>> NAMES >>>>>>> 45f992c2781f codefutures/dbshards_zookeeper:latest "/bin/sh >>>>>>> -c '/opt/zo 59 seconds ago Up 58 seconds >>>>>>> >>>>>>> I am using mesos 0.20.1 and docker 1.2.0 on Ubuntu 14.04. >>>>>>> >>>>>>> So the container is running fine. It is a long running service i.e. >>>>>>> the docker run command will never return. Should I be providing some >>>>>>> option >>>>>>> so that the docker executor passed the -d flag to the docker run >>>>>>> command? I >>>>>>> guess I should start looking through the mesos source so I can see how >>>>>>> this >>>>>>> works. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Andy. >>>>>>> >>>>>>> -- >>>>>>> Andy Grove >>>>>>> VP Engineering >>>>>>> CodeFutures Corporation >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Trying increasing the executor registration timeout on the slave >>>>>>>> (--executor_registration_timeout) to give docker more time to do a >>>>>>>> pull of >>>>>>>> the image. >>>>>>>> >>>>>>>> On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I've working on a prototype Mesos framework to launch docker >>>>>>>>> containers. I'm getting as far as seeing my container start up but >>>>>>>>> after >>>>>>>>> one minute if gets killed due to: >>>>>>>>> >>>>>>>>> Terminating executor default of framework >>>>>>>>> 20140929-155916-16777343-5050-2708-0004 because it did not register >>>>>>>>> within >>>>>>>>> 1mins >>>>>>>>> >>>>>>>>> Here is the code I am using in my scheduler, which was based on >>>>>>>>> one of the examples: >>>>>>>>> >>>>>>>>> @Override >>>>>>>>> public void resourceOffers(SchedulerDriver schedulerDriver, >>>>>>>>> List<Protos.Offer> offers) { >>>>>>>>> logger.info("resourceOffers() with {} offers", offers.size()); >>>>>>>>> >>>>>>>>> for (Protos.Offer offer : offers) { >>>>>>>>> >>>>>>>>> List<Protos.TaskInfo> tasks = new >>>>>>>>> ArrayList<Protos.TaskInfo>(); >>>>>>>>> if (launchedTasks < totalTasks) { >>>>>>>>> Protos.TaskID taskId = Protos.TaskID.newBuilder() >>>>>>>>> .setValue(Integer.toString(launchedTasks++)).build(); >>>>>>>>> >>>>>>>>> logger.info("Launching task " + taskId.getValue()); >>>>>>>>> >>>>>>>>> // docker image info >>>>>>>>> Protos.ContainerInfo.DockerInfo.Builder dockerInfoBuilder >>>>>>>>> = Protos.ContainerInfo.DockerInfo.newBuilder(); >>>>>>>>> >>>>>>>>> dockerInfoBuilder.setImage("codefutures/dbshards_zookeeper"); >>>>>>>>> >>>>>>>>> // container info >>>>>>>>> Protos.ContainerInfo.Builder containerInfoBuilder = >>>>>>>>> Protos.ContainerInfo.newBuilder(); >>>>>>>>> >>>>>>>>> containerInfoBuilder.setType(Protos.ContainerInfo.Type.DOCKER); >>>>>>>>> containerInfoBuilder.setDocker(dockerInfoBuilder.build()); >>>>>>>>> >>>>>>>>> // create executor for the container >>>>>>>>> Protos.ExecutorInfo executor = >>>>>>>>> Protos.ExecutorInfo.newBuilder() >>>>>>>>> >>>>>>>>> .setExecutorId(Protos.ExecutorID.newBuilder().setValue("default")) >>>>>>>>> >>>>>>>>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) >>>>>>>>> .setContainer(containerInfoBuilder) >>>>>>>>> .setName("Test Executor (Docker)") >>>>>>>>> .setSource("docker_test") >>>>>>>>> .build(); >>>>>>>>> >>>>>>>>> // create task to run >>>>>>>>> Protos.TaskInfo task = Protos.TaskInfo.newBuilder() >>>>>>>>> .setName("task " + taskId.getValue()) >>>>>>>>> .setTaskId(taskId) >>>>>>>>> .setSlaveId(offer.getSlaveId()) >>>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>>> .setName("cpus") >>>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>>> >>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(1))) >>>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>>> .setName("mem") >>>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>>> >>>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(128))) >>>>>>>>> .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) >>>>>>>>> .build(); >>>>>>>>> >>>>>>>>> tasks.add(task); >>>>>>>>> } >>>>>>>>> Protos.Filters filters = >>>>>>>>> Protos.Filters.newBuilder().setRefuseSeconds(1).build(); >>>>>>>>> >>>>>>>>> schedulerDriver.launchTasks(offer.getId(), tasks, filters); >>>>>>>>> } >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> Am I missing some steps with this approach? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Andy. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Andy Grove >>>>>>>>> VP Engineering >>>>>>>>> CodeFutures Corporation >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

