from mesospheres.io, it is recommended you set up the timeout as Vinod mentioned earlier to be 5 mins.
-Luyi. On Tue, Sep 30, 2014 at 11:02 AM, Andy Grove <[email protected]> wrote: > OK. So I figured out the issue with this and it was my misunderstanding of > executors and tasks. > > My task info had: > > .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) > > I should have had this: > > .setContainer(containerInfoBuilder) > .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) > > I didn't have a mesos executor deployed inside my container which explains > the timeout issue. > > Thanks again for the support. > > > Thanks, > > Andy. > > -- > Andy Grove > VP Engineering > CodeFutures Corporation > > > > On Tue, Sep 30, 2014 at 10:20 AM, Andy Grove <[email protected]> > wrote: > >> Hi Tim, >> >> Thanks for helping with this. I am running mesos-master and mesos-slave >> natively on the same host (my desktop). The only container in use is the >> one being launched by the mesos-slave. >> >> I will try your suggestion of running a simple command next. >> >> Here is the output from the slave from this issue though: >> >> I0930 10:13:52.053177 30722 main.cpp:126] Build: 2014-09-29 15:35:37 by >> andy >> I0930 10:13:52.053228 30722 main.cpp:128] Version: 0.20.1 >> I0930 10:13:53.055480 30722 containerizer.cpp:89] Using isolation: >> posix/cpu,posix/mem >> I0930 10:13:53.058353 30722 main.cpp:149] Starting Mesos slave >> I0930 10:13:53.059651 30722 slave.cpp:167] Slave started on 1)@ >> 127.0.1.1:5051 >> I0930 10:13:53.060072 30722 slave.cpp:278] Slave resources: cpus(*):8; >> mem(*):14963; disk(*):1.85648e+06; ports(*):[31000-32000] >> I0930 10:13:53.060226 30722 slave.cpp:306] Slave hostname: davros >> I0930 10:13:53.060253 30722 slave.cpp:307] Slave checkpoint: true >> I0930 10:13:53.064975 30729 state.cpp:33] Recovering state from >> '/tmp/mesos/meta' >> I0930 10:13:53.065352 30725 status_update_manager.cpp:193] Recovering >> status update manager >> I0930 10:13:53.065626 30729 docker.cpp:577] Recovering Docker containers >> I0930 10:13:53.065690 30724 containerizer.cpp:252] Recovering >> containerizer >> I0930 10:13:54.055233 30723 slave.cpp:3198] Finished recovery >> I0930 10:13:54.055448 30723 slave.cpp:589] New master detected at >> [email protected]:5050 >> I0930 10:13:54.055532 30723 slave.cpp:625] No credentials provided. >> Attempting to register without authentication >> I0930 10:13:54.055537 30730 status_update_manager.cpp:167] New master >> detected at [email protected]:5050 >> I0930 10:13:54.055552 30723 slave.cpp:636] Detecting new master >> I0930 10:13:54.928225 30724 slave.cpp:754] Registered with master >> [email protected]:5050; given slave ID >> 20140930-101303-16777343-5050-30690-0 >> I0930 10:13:54.928598 30724 slave.cpp:767] Checkpointing SlaveInfo to >> '/tmp/mesos/meta/slaves/20140930-101303-16777343-5050-30690-0/slave.info' >> I0930 10:14:17.330390 30725 slave.cpp:1002] Got assigned task 0 for >> framework 20140930-101303-16777343-5050-30690-0000 >> I0930 10:14:17.330557 30725 slave.cpp:1112] Launching task 0 for >> framework 20140930-101303-16777343-5050-30690-0000 >> I0930 10:14:17.331296 30725 slave.cpp:1222] Queuing task '0' for executor >> default of framework '20140930-101303-16777343-5050-30690-0000 >> *I0930 10:14:17.333109 30730 docker.cpp:984] Starting container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' for executor 'default' and framework >> '20140930-101303-16777343-5050-30690-0000'* >> I0930 10:14:20.062705 30730 slave.cpp:2538] Monitoring executor 'default' >> of framework '20140930-101303-16777343-5050-30690-0000' in container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >> >> The container is running quite happily at this point. >> >> I0930 10:14:53.061337 30724 slave.cpp:3053] Current usage 0.76%. Max >> allowed age: 6.247043850997720days >> *I0930 10:15:17.331712 30730 slave.cpp:3010] Terminating executor default >> of framework 20140930-101303-16777343-5050-30690-0000 because it did not >> register within 1mins* >> I0930 10:15:17.332221 30728 docker.cpp:1473] Destroying container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >> I0930 10:15:17.332308 30728 docker.cpp:1568] Running docker kill on >> container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' >> I0930 10:15:18.109361 30730 docker.cpp:1646] Executor for container >> 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' has exited >> >> >> Thanks, >> >> Andy. >> >> -- >> Andy Grove >> VP Engineering >> CodeFutures Corporation >> >> >> >> On Mon, Sep 29, 2014 at 6:25 PM, Tim Chen <[email protected]> wrote: >> >>> Hi Andy, >>> >>> You don't need to specifiy -d as the docker containerizer will set it >>> for you since we run all docker images detached. >>> >>> It seems like the executor just simply can't register with the slave. >>> Can you try just running a simple command without Docker that takes longer >>> than the executor registration timeout to see if you see the same error? >>> >>> Also do you run the mesos slave in a docker container as well? >>> >>> Will be great if you can share the slave log as Vinod suggested too. >>> >>> Tim >>> >>> >>> >>> >>> >>> >>> On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone <[email protected]> wrote: >>> >>>> I'll let Tim Chen help you out here since he has more context. Some >>>> slave logs around the failed container launch would be helpful. >>>> >>>> >>>> On Mon, Sep 29, 2014 at 5:03 PM, Andy Grove <[email protected] >>>> > wrote: >>>> >>>>> Ignore my comment about docker run not returning. That is incorrect. >>>>> >>>>> Thanks, >>>>> >>>>> Andy. >>>>> >>>>> -- >>>>> Andy Grove >>>>> VP Engineering >>>>> CodeFutures Corporation >>>>> >>>>> >>>>> >>>>> On Mon, Sep 29, 2014 at 5:59 PM, Andy Grove < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Vinod, >>>>>> >>>>>> Thanks for the quick response but the image is already on the slave >>>>>> and I see the container being launched almost immediately when my >>>>>> framework >>>>>> starts (within 1-2 seconds). If I keep running docker ps, this is the >>>>>> last >>>>>> output I see before the container is killed: >>>>>> >>>>>> $ docker ps >>>>>> CONTAINER ID IMAGE COMMAND >>>>>> CREATED STATUS PORTS >>>>>> NAMES >>>>>> 45f992c2781f codefutures/dbshards_zookeeper:latest "/bin/sh >>>>>> -c '/opt/zo 59 seconds ago Up 58 seconds >>>>>> >>>>>> I am using mesos 0.20.1 and docker 1.2.0 on Ubuntu 14.04. >>>>>> >>>>>> So the container is running fine. It is a long running service i.e. >>>>>> the docker run command will never return. Should I be providing some >>>>>> option >>>>>> so that the docker executor passed the -d flag to the docker run >>>>>> command? I >>>>>> guess I should start looking through the mesos source so I can see how >>>>>> this >>>>>> works. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Andy. >>>>>> >>>>>> -- >>>>>> Andy Grove >>>>>> VP Engineering >>>>>> CodeFutures Corporation >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Trying increasing the executor registration timeout on the slave >>>>>>> (--executor_registration_timeout) to give docker more time to do a pull >>>>>>> of >>>>>>> the image. >>>>>>> >>>>>>> On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've working on a prototype Mesos framework to launch docker >>>>>>>> containers. I'm getting as far as seeing my container start up but >>>>>>>> after >>>>>>>> one minute if gets killed due to: >>>>>>>> >>>>>>>> Terminating executor default of framework >>>>>>>> 20140929-155916-16777343-5050-2708-0004 because it did not register >>>>>>>> within >>>>>>>> 1mins >>>>>>>> >>>>>>>> Here is the code I am using in my scheduler, which was based on one >>>>>>>> of the examples: >>>>>>>> >>>>>>>> @Override >>>>>>>> public void resourceOffers(SchedulerDriver schedulerDriver, >>>>>>>> List<Protos.Offer> offers) { >>>>>>>> logger.info("resourceOffers() with {} offers", offers.size()); >>>>>>>> >>>>>>>> for (Protos.Offer offer : offers) { >>>>>>>> >>>>>>>> List<Protos.TaskInfo> tasks = new >>>>>>>> ArrayList<Protos.TaskInfo>(); >>>>>>>> if (launchedTasks < totalTasks) { >>>>>>>> Protos.TaskID taskId = Protos.TaskID.newBuilder() >>>>>>>> .setValue(Integer.toString(launchedTasks++)).build(); >>>>>>>> >>>>>>>> logger.info("Launching task " + taskId.getValue()); >>>>>>>> >>>>>>>> // docker image info >>>>>>>> Protos.ContainerInfo.DockerInfo.Builder dockerInfoBuilder = >>>>>>>> Protos.ContainerInfo.DockerInfo.newBuilder(); >>>>>>>> >>>>>>>> dockerInfoBuilder.setImage("codefutures/dbshards_zookeeper"); >>>>>>>> >>>>>>>> // container info >>>>>>>> Protos.ContainerInfo.Builder containerInfoBuilder = >>>>>>>> Protos.ContainerInfo.newBuilder(); >>>>>>>> >>>>>>>> containerInfoBuilder.setType(Protos.ContainerInfo.Type.DOCKER); >>>>>>>> containerInfoBuilder.setDocker(dockerInfoBuilder.build()); >>>>>>>> >>>>>>>> // create executor for the container >>>>>>>> Protos.ExecutorInfo executor = >>>>>>>> Protos.ExecutorInfo.newBuilder() >>>>>>>> >>>>>>>> .setExecutorId(Protos.ExecutorID.newBuilder().setValue("default")) >>>>>>>> >>>>>>>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false)) >>>>>>>> .setContainer(containerInfoBuilder) >>>>>>>> .setName("Test Executor (Docker)") >>>>>>>> .setSource("docker_test") >>>>>>>> .build(); >>>>>>>> >>>>>>>> // create task to run >>>>>>>> Protos.TaskInfo task = Protos.TaskInfo.newBuilder() >>>>>>>> .setName("task " + taskId.getValue()) >>>>>>>> .setTaskId(taskId) >>>>>>>> .setSlaveId(offer.getSlaveId()) >>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>> .setName("cpus") >>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>> >>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(1))) >>>>>>>> .addResources(Protos.Resource.newBuilder() >>>>>>>> .setName("mem") >>>>>>>> .setType(Protos.Value.Type.SCALAR) >>>>>>>> >>>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(128))) >>>>>>>> .setExecutor(Protos.ExecutorInfo.newBuilder(executor)) >>>>>>>> .build(); >>>>>>>> >>>>>>>> tasks.add(task); >>>>>>>> } >>>>>>>> Protos.Filters filters = >>>>>>>> Protos.Filters.newBuilder().setRefuseSeconds(1).build(); >>>>>>>> >>>>>>>> schedulerDriver.launchTasks(offer.getId(), tasks, filters); >>>>>>>> } >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Am I missing some steps with this approach? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Andy. >>>>>>>> >>>>>>>> -- >>>>>>>> Andy Grove >>>>>>>> VP Engineering >>>>>>>> CodeFutures Corporation >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

