Try: until <something>; do echo "waiting for something to do something" sleep 5 done
You can put this in a bash file and run that. If you have a dockerfile would be easier to debug. -Jason > On Apr 17, 2015, at 4:24 PM, Tyson Norris <tnor...@adobe.com> wrote: > > Yes, agreed that the command should not exit - but the container is killed at > around 0.5 s after launch regardless of whether the command terminates, which > is why I’ve been experimenting using commands with varied exit times. > > For example, forget about the executor needing to register momentarily. > > Using the command: > echo testing123c && sleep 0.1 && echo testing456c > -> I see the expected output in stdout, and the container is destroyed (as > expected), because the container exits quickly, and then is destroyed > > Using the command: > echo testing123d && sleep 0.6 && echo testing456d > -> I do NOT see the expected output in stdout (I only get testing123d), > because the container is destroyed prematurely after ~0.5 seconds > > Using the “real” storm command, I get no output in stdout, probably because > no output is generated within 0.5 seconds of launch - it is a bit of a pig to > startup, so I’m currently just trying to execute some other commands for > testing purposes. > > So I’m guessing this is a timeout issue, or else that the container is reaped > inappropriately, or something else… looking through this code, I’m trying to > figure out the steps take during executor launch: > https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715 > > Thanks > Tyson > > > > > >> On Apr 17, 2015, at 12:53 PM, Jason Giedymin <jason.giedy...@gmail.com> >> wrote: >> >> What is the last command you have docker doing? >> >> If that command exits then the docker will begin to end the container. >> >> -Jason >> >>> On Apr 17, 2015, at 3:23 PM, Tyson Norris <tnor...@adobe.com> wrote: >>> >>> Hi - >>> I am looking at revving the mesos-storm framework to be dockerized (and >>> simpler). >>> I’m using mesos 0.22.0-1.0.ubuntu1404 >>> mesos master + mesos slave are deployed in docker containers, in case it >>> matters. >>> >>> I have the storm (nimbus) framework launching fine as a docker container, >>> but launching tasks for a topology is having problems related to using a >>> docker-based executor. >>> >>> For example. >>> >>> TaskInfo task = TaskInfo.newBuilder() >>> .setName("worker " + slot.getNodeId() + ":" + slot.getPort()) >>> .setTaskId(taskId) >>> .setSlaveId(offer.getSlaveId()) >>> .setExecutor(ExecutorInfo.newBuilder() >>> >>> .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) >>> .setData(ByteString.copyFromUtf8(executorDataStr)) >>> .setContainer(ContainerInfo.newBuilder() >>> .setType(ContainerInfo.Type.DOCKER) >>> .setDocker(ContainerInfo.DockerInfo.newBuilder() >>> .setImage("mesos-storm”))) >>> >>> .setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm >>> supervisor storm.mesos.MesosSupervisor")) >>> //rest is unchanged from existing mesos-storm framework code >>> >>> The executor launches and exits quickly - see the log msg: Executor for >>> container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited >>> >>> It seems like mesos loses track of the executor? I understand there is a 1 >>> min timeout on registering the executor, but the exit happens well before 1 >>> minute. >>> >>> I tried a few alternate commands to experiment, and I can see in the stdout >>> for the task that >>> "echo testing123 && echo testing456” >>> prints to stdout correctly, both testing123 and testing456 >>> >>> however: >>> "echo testing123a && sleep 10 && echo testing456a” >>> prints only testing123a, presumably because the container is lost and >>> destroyed before the sleep time is up. >>> >>> So it’s like the container for the executor is only allowed to run for .5 >>> seconds, then it is detected as exited, and the task is lost. >>> >>> Thanks for any advice. >>> >>> Tyson >>> >>> >>> >>> slave logs look like: >>> mesosslave_1 | I0417 19:07:27.461230 11 slave.cpp:1121] Got assigned >>> task mesos-slave1.service.consul-31000 for framework >>> 20150417-190611-2801799596-5050-1-0000 >>> mesosslave_1 | I0417 19:07:27.461479 11 slave.cpp:1231] Launching task >>> mesos-slave1.service.consul-31000 for framework >>> 20150417-190611-2801799596-5050-1-0000 >>> mesosslave_1 | I0417 19:07:27.463250 11 slave.cpp:4160] Launching >>> executor insights-1-1429297638 of framework >>> 20150417-190611-2801799596-5050-1-0000 in work directory >>> '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' >>> mesosslave_1 | I0417 19:07:27.463444 11 slave.cpp:1378] Queuing task >>> 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of >>> framework '20150417-190611-2801799596-5050-1-0000 >>> mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting >>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor >>> 'insights-1-1429297638' and framework >>> '20150417-190611-2801799596-5050-1-0000' >>> mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for >>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited >>> mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying >>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' >>> mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring >>> executor 'insights-1-1429297638' of framework >>> '20150417-190611-2801799596-5050-1-0000' in container >>> '6539127f-9dbb-425b-86a8-845b748f0cd3' >>> mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker >>> stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' >>> mesosslave_1 | I0417 19:07:28.286761 10 slave.cpp:3186] Executor >>> 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1-0000 >>> has terminated with unknown status >>> mesosslave_1 | I0417 19:07:28.288784 10 slave.cpp:2508] Handling status >>> update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task >>> mesos-slave1.service.consul-31000 of framework >>> 20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0 >>> mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring >>> updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 >>> >>> nimbus logs (framework) look like: >>> 2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status update: >>> task_id { >>> value: "mesos-slave1.service.consul-31000" >>> } >>> state: TASK_LOST >>> message: "Container terminated" >>> slave_id { >>> value: "20150417-190611-2801799596-5050-1-S0" >>> } >>> timestamp: 1.429297648286981E9 >>> source: SOURCE_SLAVE >>> reason: REASON_EXECUTOR_TERMINATED >>> 11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327" >