What is the last command you have docker doing? If that command exits then the docker will begin to end the container.
-Jason > On Apr 17, 2015, at 3:23 PM, Tyson Norris <tnor...@adobe.com> wrote: > > Hi - > I am looking at revving the mesos-storm framework to be dockerized (and > simpler). > I’m using mesos 0.22.0-1.0.ubuntu1404 > mesos master + mesos slave are deployed in docker containers, in case it > matters. > > I have the storm (nimbus) framework launching fine as a docker container, but > launching tasks for a topology is having problems related to using a > docker-based executor. > > For example. > > TaskInfo task = TaskInfo.newBuilder() > .setName("worker " + slot.getNodeId() + ":" + slot.getPort()) > .setTaskId(taskId) > .setSlaveId(offer.getSlaveId()) > .setExecutor(ExecutorInfo.newBuilder() > > .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) > .setData(ByteString.copyFromUtf8(executorDataStr)) > .setContainer(ContainerInfo.newBuilder() > .setType(ContainerInfo.Type.DOCKER) > .setDocker(ContainerInfo.DockerInfo.newBuilder() > .setImage("mesos-storm”))) > > .setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm > supervisor storm.mesos.MesosSupervisor")) > //rest is unchanged from existing mesos-storm framework code > > The executor launches and exits quickly - see the log msg: Executor for > container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited > > It seems like mesos loses track of the executor? I understand there is a 1 > min timeout on registering the executor, but the exit happens well before 1 > minute. > > I tried a few alternate commands to experiment, and I can see in the stdout > for the task that > "echo testing123 && echo testing456” > prints to stdout correctly, both testing123 and testing456 > > however: > "echo testing123a && sleep 10 && echo testing456a” > prints only testing123a, presumably because the container is lost and > destroyed before the sleep time is up. > > So it’s like the container for the executor is only allowed to run for .5 > seconds, then it is detected as exited, and the task is lost. > > Thanks for any advice. > > Tyson > > > > slave logs look like: > mesosslave_1 | I0417 19:07:27.461230 11 slave.cpp:1121] Got assigned task > mesos-slave1.service.consul-31000 for framework > 20150417-190611-2801799596-5050-1-0000 > mesosslave_1 | I0417 19:07:27.461479 11 slave.cpp:1231] Launching task > mesos-slave1.service.consul-31000 for framework > 20150417-190611-2801799596-5050-1-0000 > mesosslave_1 | I0417 19:07:27.463250 11 slave.cpp:4160] Launching > executor insights-1-1429297638 of framework > 20150417-190611-2801799596-5050-1-0000 in work directory > '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' > mesosslave_1 | I0417 19:07:27.463444 11 slave.cpp:1378] Queuing task > 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of > framework '20150417-190611-2801799596-5050-1-0000 > mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting > container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor > 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-0000' > mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for > container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited > mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying > container '6539127f-9dbb-425b-86a8-845b748f0cd3' > mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring > executor 'insights-1-1429297638' of framework > '20150417-190611-2801799596-5050-1-0000' in container > '6539127f-9dbb-425b-86a8-845b748f0cd3' > mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker > stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' > mesosslave_1 | I0417 19:07:28.286761 10 slave.cpp:3186] Executor > 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1-0000 > has terminated with unknown status > mesosslave_1 | I0417 19:07:28.288784 10 slave.cpp:2508] Handling status > update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task > mesos-slave1.service.consul-31000 of framework > 20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0 > mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating > unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 > > nimbus logs (framework) look like: > 2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status update: > task_id { > value: "mesos-slave1.service.consul-31000" > } > state: TASK_LOST > message: "Container terminated" > slave_id { > value: "20150417-190611-2801799596-5050-1-S0" > } > timestamp: 1.429297648286981E9 > source: SOURCE_SLAVE > reason: REASON_EXECUTOR_TERMINATED > 11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327" > > >