What do any/all logs say? (syslog) -Jason
> On Apr 17, 2015, at 7:22 PM, Tyson Norris <tnor...@adobe.com> wrote: > > another interesting fact: > I can restart the docker container of my executor, and it runs great. > > In the test example below, notice the stdout appears to be growing as > expected after restarting the container. > > So something is killing my executor container (also indicated by the "Exited > (137) About a minute ago”), but I’m still not sure what. > > Thanks > Tyson > > > > tnorris-osx:insights tnorris$ docker ps -a | grep testexec > 5291fe29c9c2 testexecutor:latest > "/bin/sh -c executor About a minute ago Exited (137) > About a minute ago > > mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 > tnorris-osx:insights tnorris$ docker start > mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 > mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 > tnorris-osx:insights tnorris$ docker logs > mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > waiting for something to do something > tnorris-osx:insights tnorris$ docker stop > mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 > > >> On Apr 17, 2015, at 2:11 PM, Tyson Norris <tnor...@adobe.com> wrote: >> >> You can reproduce with most any dockerfile, I think - it seems like >> launching a customer executor that is a docker container has some problem. >> >> I just made a simple test with docker file: >> -------------------------------------- >> #this is oracle java8 atop phusion baseimage >> FROM opentable/baseimage-java8:latest >> >> >> #mesos lib (not used here, but will be in our “real” executor, e.g. to >> register the executor etc) >> RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF >> RUN echo "deb http://repos.mesosphere.io/$(lsb_release -is | tr '[:upper:]' >> '[:lower:]') $(lsb_release -cs) main" | tee >> /etc/apt/sources.list.d/mesosphere.list >> RUN cat /etc/apt/sources.list.d/mesosphere.list >> RUN apt-get update && apt-get install -y \ >> mesos >> >> ADD script.sh /usr/bin/executor-script.sh >> >> CMD executor-script.sh >> -------------------------------------- >> >> and script.sh: >> -------------------------------------- >> #!/bin/bash >> until false; do >> echo "waiting for something to do something" >> sleep 0.2 >> done >> -------------------------------------- >> >> And in my stdout I get exactly 2 lines: >> waiting for something to do something >> waiting for something to do something >> >> Which is how many lines can be output in within 0.5 seconds…something is >> fishy about the 0.5 seconds, but I’m not sure where. >> >> I’m not sure exactly the difference, but launching a docker container as a >> task WITHOUT a custom executor works fine, and I’m not sure about launching >> a docker container as a task that is using a non-docker custom executor. The >> case I’m trying for is using a docker customer executor, and launching >> non-docker tasks. (in case that helps clarify the situation). >> >> Thanks >> Tyson >> >> >> >> >> >>> On Apr 17, 2015, at 1:47 PM, Jason Giedymin <jason.giedy...@gmail.com> >>> wrote: >>> >>> Try: >>> >>> until <something>; do >>> echo "waiting for something to do something" >>> sleep 5 >>> done >>> >>> You can put this in a bash file and run that. >>> >>> If you have a dockerfile would be easier to debug. >>> >>> -Jason >>> >>> On Apr 17, 2015, at 4:24 PM, Tyson Norris <tnor...@adobe.com> wrote: >>> >>>> Yes, agreed that the command should not exit - but the container is killed >>>> at around 0.5 s after launch regardless of whether the command terminates, >>>> which is why I’ve been experimenting using commands with varied exit >>>> times. >>>> >>>> For example, forget about the executor needing to register momentarily. >>>> >>>> Using the command: >>>> echo testing123c && sleep 0.1 && echo testing456c >>>> -> I see the expected output in stdout, and the container is destroyed (as >>>> expected), because the container exits quickly, and then is destroyed >>>> >>>> Using the command: >>>> echo testing123d && sleep 0.6 && echo testing456d >>>> -> I do NOT see the expected output in stdout (I only get testing123d), >>>> because the container is destroyed prematurely after ~0.5 seconds >>>> >>>> Using the “real” storm command, I get no output in stdout, probably >>>> because no output is generated within 0.5 seconds of launch - it is a bit >>>> of a pig to startup, so I’m currently just trying to execute some other >>>> commands for testing purposes. >>>> >>>> So I’m guessing this is a timeout issue, or else that the container is >>>> reaped inappropriately, or something else… looking through this code, I’m >>>> trying to figure out the steps take during executor launch: >>>> https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715 >>>> >>>> Thanks >>>> Tyson >>>> >>>> >>>> >>>> >>>> >>>>> On Apr 17, 2015, at 12:53 PM, Jason Giedymin <jason.giedy...@gmail.com> >>>>> wrote: >>>>> >>>>> What is the last command you have docker doing? >>>>> >>>>> If that command exits then the docker will begin to end the container. >>>>> >>>>> -Jason >>>>> >>>>>> On Apr 17, 2015, at 3:23 PM, Tyson Norris <tnor...@adobe.com> wrote: >>>>>> >>>>>> Hi - >>>>>> I am looking at revving the mesos-storm framework to be dockerized (and >>>>>> simpler). >>>>>> I’m using mesos 0.22.0-1.0.ubuntu1404 >>>>>> mesos master + mesos slave are deployed in docker containers, in case it >>>>>> matters. >>>>>> >>>>>> I have the storm (nimbus) framework launching fine as a docker >>>>>> container, but launching tasks for a topology is having problems related >>>>>> to using a docker-based executor. >>>>>> >>>>>> For example. >>>>>> >>>>>> TaskInfo task = TaskInfo.newBuilder() >>>>>> .setName("worker " + slot.getNodeId() + ":" + slot.getPort()) >>>>>> .setTaskId(taskId) >>>>>> .setSlaveId(offer.getSlaveId()) >>>>>> .setExecutor(ExecutorInfo.newBuilder() >>>>>> >>>>>> .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) >>>>>> .setData(ByteString.copyFromUtf8(executorDataStr)) >>>>>> .setContainer(ContainerInfo.newBuilder() >>>>>> .setType(ContainerInfo.Type.DOCKER) >>>>>> >>>>>> .setDocker(ContainerInfo.DockerInfo.newBuilder() >>>>>> .setImage("mesos-storm”))) >>>>>> >>>>>> .setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm >>>>>> supervisor storm.mesos.MesosSupervisor")) >>>>>> //rest is unchanged from existing mesos-storm framework code >>>>>> >>>>>> The executor launches and exits quickly - see the log msg: Executor for >>>>>> container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited >>>>>> >>>>>> It seems like mesos loses track of the executor? I understand there is a >>>>>> 1 min timeout on registering the executor, but the exit happens well >>>>>> before 1 minute. >>>>>> >>>>>> I tried a few alternate commands to experiment, and I can see in the >>>>>> stdout for the task that >>>>>> "echo testing123 && echo testing456” >>>>>> prints to stdout correctly, both testing123 and testing456 >>>>>> >>>>>> however: >>>>>> "echo testing123a && sleep 10 && echo testing456a” >>>>>> prints only testing123a, presumably because the container is lost and >>>>>> destroyed before the sleep time is up. >>>>>> >>>>>> So it’s like the container for the executor is only allowed to run for >>>>>> .5 seconds, then it is detected as exited, and the task is lost. >>>>>> >>>>>> Thanks for any advice. >>>>>> >>>>>> Tyson >>>>>> >>>>>> >>>>>> >>>>>> slave logs look like: >>>>>> mesosslave_1 | I0417 19:07:27.461230 11 slave.cpp:1121] Got assigned >>>>>> task mesos-slave1.service.consul-31000 for framework >>>>>> 20150417-190611-2801799596-5050-1-0000 >>>>>> mesosslave_1 | I0417 19:07:27.461479 11 slave.cpp:1231] Launching >>>>>> task mesos-slave1.service.consul-31000 for framework >>>>>> 20150417-190611-2801799596-5050-1-0000 >>>>>> mesosslave_1 | I0417 19:07:27.463250 11 slave.cpp:4160] Launching >>>>>> executor insights-1-1429297638 of framework >>>>>> 20150417-190611-2801799596-5050-1-0000 in work directory >>>>>> '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' >>>>>> mesosslave_1 | I0417 19:07:27.463444 11 slave.cpp:1378] Queuing task >>>>>> 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 >>>>>> of framework '20150417-190611-2801799596-5050-1-0000 >>>>>> mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting >>>>>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor >>>>>> 'insights-1-1429297638' and framework >>>>>> '20150417-190611-2801799596-5050-1-0000' >>>>>> mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor >>>>>> for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited >>>>>> mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying >>>>>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' >>>>>> mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring >>>>>> executor 'insights-1-1429297638' of framework >>>>>> '20150417-190611-2801799596-5050-1-0000' in container >>>>>> '6539127f-9dbb-425b-86a8-845b748f0cd3' >>>>>> mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running >>>>>> docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' >>>>>> mesosslave_1 | I0417 19:07:28.286761 10 slave.cpp:3186] Executor >>>>>> 'insights-1-1429297638' of framework >>>>>> 20150417-190611-2801799596-5050-1-0000 has terminated with unknown status >>>>>> mesosslave_1 | I0417 19:07:28.288784 10 slave.cpp:2508] Handling >>>>>> status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for >>>>>> task mesos-slave1.service.consul-31000 of framework >>>>>> 20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0 >>>>>> mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring >>>>>> updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 >>>>>> >>>>>> nimbus logs (framework) look like: >>>>>> 2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status >>>>>> update: task_id { >>>>>> value: "mesos-slave1.service.consul-31000" >>>>>> } >>>>>> state: TASK_LOST >>>>>> message: "Container terminated" >>>>>> slave_id { >>>>>> value: "20150417-190611-2801799596-5050-1-S0" >>>>>> } >>>>>> timestamp: 1.429297648286981E9 >>>>>> source: SOURCE_SLAVE >>>>>> reason: REASON_EXECUTOR_TERMINATED >>>>>> 11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327" >