What do any/all logs say? (syslog)

-Jason

> On Apr 17, 2015, at 7:22 PM, Tyson Norris <tnor...@adobe.com> wrote:
> 
> another interesting fact:
> I can restart the docker container of my executor, and it runs great. 
> 
> In the test example below, notice the stdout appears to be growing as 
> expected after restarting the container.
> 
> So something is killing my executor container (also indicated by the "Exited 
> (137) About a minute ago”), but I’m still not sure what.
> 
> Thanks
> Tyson
> 
> 
> 
> tnorris-osx:insights tnorris$ docker ps -a | grep testexec
> 5291fe29c9c2        testexecutor:latest                                       
>                 "/bin/sh -c executor   About a minute ago   Exited (137) 
> About a minute ago                                                            
>                                                                
> mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9   
> tnorris-osx:insights tnorris$ docker start 
> mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
> mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
> tnorris-osx:insights tnorris$ docker logs 
> mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> waiting for something to do something
> tnorris-osx:insights tnorris$ docker stop 
> mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
> 
> 
>> On Apr 17, 2015, at 2:11 PM, Tyson Norris <tnor...@adobe.com> wrote:
>> 
>> You can reproduce with most any dockerfile, I think - it seems like 
>> launching a customer executor that is a docker container has some problem. 
>> 
>> I just made a simple test with docker file:
>> --------------------------------------
>> #this is oracle java8 atop phusion baseimage
>> FROM opentable/baseimage-java8:latest
>> 
>> 
>> #mesos lib (not used here, but will be in our “real” executor, e.g. to 
>> register the executor etc)
>> RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF
>> RUN echo "deb http://repos.mesosphere.io/$(lsb_release -is | tr '[:upper:]' 
>> '[:lower:]') $(lsb_release -cs) main" | tee 
>> /etc/apt/sources.list.d/mesosphere.list
>> RUN cat /etc/apt/sources.list.d/mesosphere.list
>> RUN apt-get update && apt-get install -y \
>>     mesos
>> 
>> ADD script.sh /usr/bin/executor-script.sh
>> 
>> CMD executor-script.sh
>> --------------------------------------
>> 
>> and script.sh:
>> --------------------------------------
>> #!/bin/bash
>> until false; do
>>   echo "waiting for something to do something"
>>   sleep 0.2
>> done
>> --------------------------------------
>> 
>> And in my stdout I get exactly 2 lines:
>> waiting for something to do something
>> waiting for something to do something
>> 
>> Which is how many lines can be output in within 0.5 seconds…something is 
>> fishy about the 0.5 seconds, but I’m not sure where.
>> 
>> I’m not sure exactly the difference, but launching a docker container as a 
>> task WITHOUT a custom executor works fine, and I’m not sure about launching 
>> a docker container as a task that is using a non-docker custom executor. The 
>> case I’m trying for is using a docker customer executor, and launching 
>> non-docker tasks. (in case that helps clarify the situation).
>> 
>> Thanks
>> Tyson
>> 
>> 
>> 
>> 
>> 
>>> On Apr 17, 2015, at 1:47 PM, Jason Giedymin <jason.giedy...@gmail.com> 
>>> wrote:
>>> 
>>> Try: 
>>> 
>>> until <something>; do
>>>   echo "waiting for something to do something"
>>>   sleep 5
>>> done
>>> 
>>> You can put this in a bash file and run that.
>>> 
>>> If you have a dockerfile would be easier to debug.
>>> 
>>> -Jason
>>> 
>>> On Apr 17, 2015, at 4:24 PM, Tyson Norris <tnor...@adobe.com> wrote:
>>> 
>>>> Yes, agreed that the command should not exit - but the container is killed 
>>>> at around 0.5 s after launch regardless of whether the command terminates, 
>>>> which is why I’ve been experimenting using commands with varied exit 
>>>> times. 
>>>> 
>>>> For example, forget about the executor needing to register momentarily.
>>>> 
>>>> Using the command:
>>>> echo testing123c && sleep 0.1 && echo testing456c
>>>> -> I see the expected output in stdout, and the container is destroyed (as 
>>>> expected), because the container exits quickly, and then is destroyed
>>>> 
>>>> Using the command:
>>>> echo testing123d && sleep 0.6 && echo testing456d
>>>> -> I do NOT see the expected output in stdout (I only get testing123d), 
>>>> because the container is destroyed prematurely after ~0.5 seconds
>>>> 
>>>> Using the “real” storm command, I get no output in stdout, probably 
>>>> because no output is generated within 0.5 seconds of launch - it is a bit 
>>>> of a pig to startup, so I’m currently just trying to execute some other 
>>>> commands for testing purposes.
>>>> 
>>>> So I’m guessing this is a timeout issue, or else that the container is 
>>>> reaped inappropriately, or something else… looking through this code, I’m 
>>>> trying to figure out the steps take during executor launch:
>>>> https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715
>>>> 
>>>> Thanks
>>>> Tyson
>>>>   
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Apr 17, 2015, at 12:53 PM, Jason Giedymin <jason.giedy...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> What is the last command you have docker doing?
>>>>> 
>>>>> If that command exits then the docker will begin to end the container.
>>>>> 
>>>>> -Jason
>>>>> 
>>>>>> On Apr 17, 2015, at 3:23 PM, Tyson Norris <tnor...@adobe.com> wrote:
>>>>>> 
>>>>>> Hi -
>>>>>> I am looking at revving the mesos-storm framework to be dockerized (and 
>>>>>> simpler). 
>>>>>> I’m using mesos 0.22.0-1.0.ubuntu1404
>>>>>> mesos master + mesos slave are deployed in docker containers, in case it 
>>>>>> matters. 
>>>>>> 
>>>>>> I have the storm (nimbus) framework launching fine as a docker 
>>>>>> container, but launching tasks for a topology is having problems related 
>>>>>> to using a docker-based executor.
>>>>>> 
>>>>>> For example. 
>>>>>> 
>>>>>> TaskInfo task = TaskInfo.newBuilder()
>>>>>>   .setName("worker " + slot.getNodeId() + ":" + slot.getPort())
>>>>>>   .setTaskId(taskId)
>>>>>>   .setSlaveId(offer.getSlaveId())
>>>>>>   .setExecutor(ExecutorInfo.newBuilder()
>>>>>>                   
>>>>>> .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
>>>>>>                   .setData(ByteString.copyFromUtf8(executorDataStr))
>>>>>>                   .setContainer(ContainerInfo.newBuilder()
>>>>>>                           .setType(ContainerInfo.Type.DOCKER)
>>>>>>                           
>>>>>> .setDocker(ContainerInfo.DockerInfo.newBuilder()
>>>>>>                                           .setImage("mesos-storm”)))
>>>>>>                   
>>>>>> .setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm 
>>>>>> supervisor storm.mesos.MesosSupervisor"))
>>>>>>       //rest is unchanged from existing mesos-storm framework code
>>>>>> 
>>>>>> The executor launches and exits quickly - see the log msg:  Executor for 
>>>>>> container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
>>>>>> 
>>>>>> It seems like mesos loses track of the executor? I understand there is a 
>>>>>> 1 min timeout on registering the executor, but the exit happens well 
>>>>>> before 1 minute.
>>>>>> 
>>>>>> I tried a few alternate commands to experiment, and I can see in the 
>>>>>> stdout for the task that
>>>>>> "echo testing123 && echo testing456” 
>>>>>> prints to stdout correctly, both testing123 and testing456
>>>>>> 
>>>>>> however:
>>>>>> "echo testing123a && sleep 10 && echo testing456a” 
>>>>>> prints only testing123a, presumably because the container is lost and 
>>>>>> destroyed before the sleep time is up.
>>>>>> 
>>>>>> So it’s like the container for the executor is only allowed to run for 
>>>>>> .5 seconds, then it is detected as exited, and the task is lost. 
>>>>>> 
>>>>>> Thanks for any advice.
>>>>>> 
>>>>>> Tyson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> slave logs look like:
>>>>>> mesosslave_1  | I0417 19:07:27.461230    11 slave.cpp:1121] Got assigned 
>>>>>> task mesos-slave1.service.consul-31000 for framework 
>>>>>> 20150417-190611-2801799596-5050-1-0000
>>>>>> mesosslave_1  | I0417 19:07:27.461479    11 slave.cpp:1231] Launching 
>>>>>> task mesos-slave1.service.consul-31000 for framework 
>>>>>> 20150417-190611-2801799596-5050-1-0000
>>>>>> mesosslave_1  | I0417 19:07:27.463250    11 slave.cpp:4160] Launching 
>>>>>> executor insights-1-1429297638 of framework 
>>>>>> 20150417-190611-2801799596-5050-1-0000 in work directory 
>>>>>> '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
>>>>>> mesosslave_1  | I0417 19:07:27.463444    11 slave.cpp:1378] Queuing task 
>>>>>> 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 
>>>>>> of framework '20150417-190611-2801799596-5050-1-0000
>>>>>> mesosslave_1  | I0417 19:07:27.467200     7 docker.cpp:755] Starting 
>>>>>> container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 
>>>>>> 'insights-1-1429297638' and framework 
>>>>>> '20150417-190611-2801799596-5050-1-0000'
>>>>>> mesosslave_1  | I0417 19:07:27.985935     7 docker.cpp:1333] Executor 
>>>>>> for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
>>>>>> mesosslave_1  | I0417 19:07:27.986359     7 docker.cpp:1159] Destroying 
>>>>>> container '6539127f-9dbb-425b-86a8-845b748f0cd3'
>>>>>> mesosslave_1  | I0417 19:07:27.986021     9 slave.cpp:3135] Monitoring 
>>>>>> executor 'insights-1-1429297638' of framework 
>>>>>> '20150417-190611-2801799596-5050-1-0000' in container 
>>>>>> '6539127f-9dbb-425b-86a8-845b748f0cd3'
>>>>>> mesosslave_1  | I0417 19:07:27.986464     7 docker.cpp:1248] Running 
>>>>>> docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
>>>>>> mesosslave_1  | I0417 19:07:28.286761    10 slave.cpp:3186] Executor 
>>>>>> 'insights-1-1429297638' of framework 
>>>>>> 20150417-190611-2801799596-5050-1-0000 has terminated with unknown status
>>>>>> mesosslave_1  | I0417 19:07:28.288784    10 slave.cpp:2508] Handling 
>>>>>> status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for 
>>>>>> task mesos-slave1.service.consul-31000 of framework 
>>>>>> 20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0
>>>>>> mesosslave_1  | W0417 19:07:28.289227     9 docker.cpp:841] Ignoring 
>>>>>> updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3
>>>>>> 
>>>>>> nimbus logs (framework) look like:
>>>>>> 2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status 
>>>>>> update: task_id {
>>>>>> value: "mesos-slave1.service.consul-31000"
>>>>>> }
>>>>>> state: TASK_LOST
>>>>>> message: "Container terminated"
>>>>>> slave_id {
>>>>>> value: "20150417-190611-2801799596-5050-1-S0"
>>>>>> }
>>>>>> timestamp: 1.429297648286981E9
>>>>>> source: SOURCE_SLAVE
>>>>>> reason: REASON_EXECUTOR_TERMINATED
>>>>>> 11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327"
> 

Reply via email to