Re: docker based executor

Jason Giedymin Fri, 17 Apr 2015 13:48:37 -0700

Try: 

until <something>; do
  echo "waiting for something to do something"
  sleep 5
done


You can put this in a bash file and run that.

If you have a dockerfile would be easier to debug.

-Jason

> On Apr 17, 2015, at 4:24 PM, Tyson Norris <tnor...@adobe.com> wrote:
> 
> Yes, agreed that the command should not exit - but the container is killed at 
> around 0.5 s after launch regardless of whether the command terminates, which 
> is why I’ve been experimenting using commands with varied exit times. 
> 
> For example, forget about the executor needing to register momentarily.
> 
> Using the command:
> echo testing123c && sleep 0.1 && echo testing456c
> -> I see the expected output in stdout, and the container is destroyed (as 
> expected), because the container exits quickly, and then is destroyed
> 
> Using the command:
> echo testing123d && sleep 0.6 && echo testing456d
> -> I do NOT see the expected output in stdout (I only get testing123d), 
> because the container is destroyed prematurely after ~0.5 seconds
> 
> Using the “real” storm command, I get no output in stdout, probably because 
> no output is generated within 0.5 seconds of launch - it is a bit of a pig to 
> startup, so I’m currently just trying to execute some other commands for 
> testing purposes.
> 
> So I’m guessing this is a timeout issue, or else that the container is reaped 
> inappropriately, or something else… looking through this code, I’m trying to 
> figure out the steps take during executor launch:
> https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715
> 
> Thanks
> Tyson
>   
> 
> 
> 
> 
>> On Apr 17, 2015, at 12:53 PM, Jason Giedymin <jason.giedy...@gmail.com> 
>> wrote:
>> 
>> What is the last command you have docker doing?
>> 
>> If that command exits then the docker will begin to end the container.
>> 
>> -Jason
>> 
>>> On Apr 17, 2015, at 3:23 PM, Tyson Norris <tnor...@adobe.com> wrote:
>>> 
>>> Hi -
>>> I am looking at revving the mesos-storm framework to be dockerized (and 
>>> simpler). 
>>> I’m using mesos 0.22.0-1.0.ubuntu1404
>>> mesos master + mesos slave are deployed in docker containers, in case it 
>>> matters. 
>>> 
>>> I have the storm (nimbus) framework launching fine as a docker container, 
>>> but launching tasks for a topology is having problems related to using a 
>>> docker-based executor.
>>> 
>>> For example. 
>>> 
>>> TaskInfo task = TaskInfo.newBuilder()
>>>   .setName("worker " + slot.getNodeId() + ":" + slot.getPort())
>>>   .setTaskId(taskId)
>>>   .setSlaveId(offer.getSlaveId())
>>>   .setExecutor(ExecutorInfo.newBuilder()
>>>                   
>>> .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
>>>                   .setData(ByteString.copyFromUtf8(executorDataStr))
>>>                   .setContainer(ContainerInfo.newBuilder()
>>>                           .setType(ContainerInfo.Type.DOCKER)
>>>                           .setDocker(ContainerInfo.DockerInfo.newBuilder()
>>>                                           .setImage("mesos-storm”)))
>>>                   
>>> .setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm 
>>> supervisor storm.mesos.MesosSupervisor"))
>>>       //rest is unchanged from existing mesos-storm framework code
>>> 
>>> The executor launches and exits quickly - see the log msg:  Executor for 
>>> container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
>>> 
>>> It seems like mesos loses track of the executor? I understand there is a 1 
>>> min timeout on registering the executor, but the exit happens well before 1 
>>> minute.
>>> 
>>> I tried a few alternate commands to experiment, and I can see in the stdout 
>>> for the task that
>>> "echo testing123 && echo testing456” 
>>> prints to stdout correctly, both testing123 and testing456
>>> 
>>> however:
>>> "echo testing123a && sleep 10 && echo testing456a” 
>>> prints only testing123a, presumably because the container is lost and 
>>> destroyed before the sleep time is up.
>>> 
>>> So it’s like the container for the executor is only allowed to run for .5 
>>> seconds, then it is detected as exited, and the task is lost. 
>>> 
>>> Thanks for any advice.
>>> 
>>> Tyson
>>> 
>>> 
>>> 
>>> slave logs look like:
>>> mesosslave_1  | I0417 19:07:27.461230    11 slave.cpp:1121] Got assigned 
>>> task mesos-slave1.service.consul-31000 for framework 
>>> 20150417-190611-2801799596-5050-1-0000
>>> mesosslave_1  | I0417 19:07:27.461479    11 slave.cpp:1231] Launching task 
>>> mesos-slave1.service.consul-31000 for framework 
>>> 20150417-190611-2801799596-5050-1-0000
>>> mesosslave_1  | I0417 19:07:27.463250    11 slave.cpp:4160] Launching 
>>> executor insights-1-1429297638 of framework 
>>> 20150417-190611-2801799596-5050-1-0000 in work directory 
>>> '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:27.463444    11 slave.cpp:1378] Queuing task 
>>> 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
>>> framework '20150417-190611-2801799596-5050-1-0000
>>> mesosslave_1  | I0417 19:07:27.467200     7 docker.cpp:755] Starting 
>>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 
>>> 'insights-1-1429297638' and framework 
>>> '20150417-190611-2801799596-5050-1-0000'
>>> mesosslave_1  | I0417 19:07:27.985935     7 docker.cpp:1333] Executor for 
>>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
>>> mesosslave_1  | I0417 19:07:27.986359     7 docker.cpp:1159] Destroying 
>>> container '6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:27.986021     9 slave.cpp:3135] Monitoring 
>>> executor 'insights-1-1429297638' of framework 
>>> '20150417-190611-2801799596-5050-1-0000' in container 
>>> '6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:27.986464     7 docker.cpp:1248] Running docker 
>>> stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:28.286761    10 slave.cpp:3186] Executor 
>>> 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1-0000 
>>> has terminated with unknown status
>>> mesosslave_1  | I0417 19:07:28.288784    10 slave.cpp:2508] Handling status 
>>> update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
>>> mesos-slave1.service.consul-31000 of framework 
>>> 20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0
>>> mesosslave_1  | W0417 19:07:28.289227     9 docker.cpp:841] Ignoring 
>>> updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3
>>> 
>>> nimbus logs (framework) look like:
>>> 2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status update: 
>>> task_id {
>>> value: "mesos-slave1.service.consul-31000"
>>> }
>>> state: TASK_LOST
>>> message: "Container terminated"
>>> slave_id {
>>> value: "20150417-190611-2801799596-5050-1-S0"
>>> }
>>> timestamp: 1.429297648286981E9
>>> source: SOURCE_SLAVE
>>> reason: REASON_EXECUTOR_TERMINATED
>>> 11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327"
>

Re: docker based executor

Reply via email to