You can reproduce with most any dockerfile, I think - it seems like launching a 
customer executor that is a docker container has some problem.

I just made a simple test with docker file:
--------------------------------------
#this is oracle java8 atop phusion baseimage
FROM opentable/baseimage-java8:latest


#mesos lib (not used here, but will be in our “real” executor, e.g. to register 
the executor etc)
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF
RUN echo "deb http://repos.mesosphere.io/$(lsb_release -is | tr '[:upper:]' 
'[:lower:]') $(lsb_release -cs) main" | tee 
/etc/apt/sources.list.d/mesosphere.list
RUN cat /etc/apt/sources.list.d/mesosphere.list
RUN apt-get update && apt-get install -y \
    mesos

ADD script.sh /usr/bin/executor-script.sh

CMD executor-script.sh
--------------------------------------

and script.sh:
--------------------------------------
#!/bin/bash
until false; do
  echo "waiting for something to do something"
  sleep 0.2
done
--------------------------------------

And in my stdout I get exactly 2 lines:
waiting for something to do something
waiting for something to do something

Which is how many lines can be output in within 0.5 seconds…something is fishy 
about the 0.5 seconds, but I’m not sure where.

I’m not sure exactly the difference, but launching a docker container as a task 
WITHOUT a custom executor works fine, and I’m not sure about launching a docker 
container as a task that is using a non-docker custom executor. The case I’m 
trying for is using a docker customer executor, and launching non-docker tasks. 
(in case that helps clarify the situation).

Thanks
Tyson





On Apr 17, 2015, at 1:47 PM, Jason Giedymin 
<jason.giedy...@gmail.com<mailto:jason.giedy...@gmail.com>> wrote:

Try:


until <something>; do
  echo "waiting for something to do something"
  sleep 5
done

You can put this in a bash file and run that.

If you have a dockerfile would be easier to debug.

-Jason

On Apr 17, 2015, at 4:24 PM, Tyson Norris 
<tnor...@adobe.com<mailto:tnor...@adobe.com>> wrote:

Yes, agreed that the command should not exit - but the container is killed at 
around 0.5 s after launch regardless of whether the command terminates, which 
is why I’ve been experimenting using commands with varied exit times.

For example, forget about the executor needing to register momentarily.

Using the command:
echo testing123c && sleep 0.1 && echo testing456c
-> I see the expected output in stdout, and the container is destroyed (as 
expected), because the container exits quickly, and then is destroyed

Using the command:
echo testing123d && sleep 0.6 && echo testing456d
-> I do NOT see the expected output in stdout (I only get testing123d), because 
the container is destroyed prematurely after ~0.5 seconds

Using the “real” storm command, I get no output in stdout, probably because no 
output is generated within 0.5 seconds of launch - it is a bit of a pig to 
startup, so I’m currently just trying to execute some other commands for 
testing purposes.

So I’m guessing this is a timeout issue, or else that the container is reaped 
inappropriately, or something else… looking through this code, I’m trying to 
figure out the steps take during executor launch:
https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715

Thanks
Tyson





On Apr 17, 2015, at 12:53 PM, Jason Giedymin 
<jason.giedy...@gmail.com<mailto:jason.giedy...@gmail.com>> wrote:

What is the last command you have docker doing?

If that command exits then the docker will begin to end the container.

-Jason

On Apr 17, 2015, at 3:23 PM, Tyson Norris 
<tnor...@adobe.com<mailto:tnor...@adobe.com>> wrote:

Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
  .setName("worker " + slot.getNodeId() + ":" + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
                  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
                  .setData(ByteString.copyFromUtf8(executorDataStr))
                  .setContainer(ContainerInfo.newBuilder()
                          .setType(ContainerInfo.Type.DOCKER)
                          .setDocker(ContainerInfo.DockerInfo.newBuilder()
                                          .setImage("mesos-storm”)))
                  
.setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm supervisor 
storm.mesos.MesosSupervisor"))
      //rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
"echo testing123 && echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
"echo testing123a && sleep 10 && echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.461230    11 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-0000
mesosslave_1  | I0417 19:07:27.461479    11 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-0000
mesosslave_1  | I0417 19:07:27.463250    11 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1-0000 in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.463444    11 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-0000
mesosslave_1  | I0417 19:07:27.467200     7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-0000'
mesosslave_1  | I0417 19:07:27.985935     7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359     7 docker.cpp:1159] Destroying 
container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986021     9 slave.cpp:3135] Monitoring executor 
'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-0000' 
in container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986464     7 docker.cpp:1248] Running docker 
stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:28.286761    10 slave.cpp:3186] Executor 
'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1-0000 has 
terminated with unknown status
mesosslave_1  | I0417 19:07:28.288784    10 slave.cpp:2508] Handling status 
update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
mesos-slave1.service.consul-31000 of framework 
20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0
mesosslave_1  | W0417 19:07:28.289227     9 docker.cpp:841] Ignoring updating 
unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3

nimbus logs (framework) look like:
2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status update: 
task_id {
value: "mesos-slave1.service.consul-31000"
}
state: TASK_LOST
message: "Container terminated"
slave_id {
value: "20150417-190611-2801799596-5050-1-S0"
}
timestamp: 1.429297648286981E9
source: SOURCE_SLAVE
reason: REASON_EXECUTOR_TERMINATED
11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327"





Reply via email to