Re: Google Borg paper
At Google there are always to do everything: the deprecated one and the one that's not quite ready yet I'm sure Borg is alive and well (but deprecated) and Omega has been deployed (but ain't quite ready yet) They were already working on it in 2010, I'm sure they're still at it. Will confirm soon as I find out more. On Apr 16, 2015 9:08 PM, Christos Kozyrakis kozyr...@gmail.com wrote: Maxime, to the best of my knowledge Borg is still doing just fine at Google. It may have been enhanced by the Omega effort but it has not been replaced. Nevertheless, I will let any Googlers on the list go into details. Christos On Thu, Apr 16, 2015 at 4:19 PM, Maxime Brugidou maxime.brugi...@gmail.com wrote: Hi, Not sure if everyone noticed but Google just published a paper about the Borg architecture. I guess it's been replaced by Omega now internally at Google (if anyone from Google can confirm?) It might be of interest for Mesos :) http://research.google.com/pubs/pub43438.html Best, Maxime -- Christos
Shenzhen MUG First Meetup !
Our first meetup has been anounced ! Please check it out at http://www.meetup.com/Shenzhen-Mesos-User-Group/events/221879815/ . We will have about three topics, and Shenzhen MUG T-shirts :P -- Zhipeng (Howard) Huang Standard Engineer IT Standard Patent/IT Prooduct Line Huawei Technologies Co,. Ltd Email: huangzhip...@huawei.com Office: Huawei Industrial Base, Longgang, Shenzhen (Previous) Research Assistant Mobile Ad-Hoc Network Lab, Calit2 University of California, Irvine Email: zhipe...@uci.edu Office: Calit2 Building Room 2402 OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
docker based executor
Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' in container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:28.28676110 slave.cpp:3186] Executor 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has terminated with unknown status mesosslave_1 | I0417 19:07:28.28878410 slave.cpp:2508] Handling status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task mesos-slave1.service.consul-31000 of framework 20150417-190611-2801799596-5050-1- from @0.0.0.0:0 mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 nimbus logs (framework) look like: 2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: task_id { value: mesos-slave1.service.consul-31000 } state: TASK_LOST message: Container terminated slave_id { value: 20150417-190611-2801799596-5050-1-S0 } timestamp: 1.429297648286981E9 source: SOURCE_SLAVE reason: REASON_EXECUTOR_TERMINATED 11: \a\225\245\213\364\207B\342\252\241\242o\346\203N\327
Re: docker based executor
Yes, agreed that the command should not exit - but the container is killed at around 0.5 s after launch regardless of whether the command terminates, which is why I’ve been experimenting using commands with varied exit times. For example, forget about the executor needing to register momentarily. Using the command: echo testing123c sleep 0.1 echo testing456c - I see the expected output in stdout, and the container is destroyed (as expected), because the container exits quickly, and then is destroyed Using the command: echo testing123d sleep 0.6 echo testing456d - I do NOT see the expected output in stdout (I only get testing123d), because the container is destroyed prematurely after ~0.5 seconds Using the “real” storm command, I get no output in stdout, probably because no output is generated within 0.5 seconds of launch - it is a bit of a pig to startup, so I’m currently just trying to execute some other commands for testing purposes. So I’m guessing this is a timeout issue, or else that the container is reaped inappropriately, or something else… looking through this code, I’m trying to figure out the steps take during executor launch: https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715 Thanks Tyson On Apr 17, 2015, at 12:53 PM, Jason Giedymin jason.giedy...@gmail.commailto:jason.giedy...@gmail.com wrote: What is the last command you have docker doing? If that command exits then the docker will begin to end the container. -Jason On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.commailto:tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying
Re: docker based executor
What is the last command you have docker doing? If that command exits then the docker will begin to end the container. -Jason On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' in container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:28.28676110 slave.cpp:3186] Executor 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has terminated with unknown status mesosslave_1 | I0417 19:07:28.28878410 slave.cpp:2508] Handling status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task mesos-slave1.service.consul-31000 of framework 20150417-190611-2801799596-5050-1- from @0.0.0.0:0 mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 nimbus logs (framework) look like: 2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: task_id { value: mesos-slave1.service.consul-31000 } state: TASK_LOST message: Container terminated slave_id { value: 20150417-190611-2801799596-5050-1-S0 } timestamp: 1.429297648286981E9 source: SOURCE_SLAVE reason
Re: docker based executor
Try: until something; do echo waiting for something to do something sleep 5 done You can put this in a bash file and run that. If you have a dockerfile would be easier to debug. -Jason On Apr 17, 2015, at 4:24 PM, Tyson Norris tnor...@adobe.com wrote: Yes, agreed that the command should not exit - but the container is killed at around 0.5 s after launch regardless of whether the command terminates, which is why I’ve been experimenting using commands with varied exit times. For example, forget about the executor needing to register momentarily. Using the command: echo testing123c sleep 0.1 echo testing456c - I see the expected output in stdout, and the container is destroyed (as expected), because the container exits quickly, and then is destroyed Using the command: echo testing123d sleep 0.6 echo testing456d - I do NOT see the expected output in stdout (I only get testing123d), because the container is destroyed prematurely after ~0.5 seconds Using the “real” storm command, I get no output in stdout, probably because no output is generated within 0.5 seconds of launch - it is a bit of a pig to startup, so I’m currently just trying to execute some other commands for testing purposes. So I’m guessing this is a timeout issue, or else that the container is reaped inappropriately, or something else… looking through this code, I’m trying to figure out the steps take during executor launch: https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715 Thanks Tyson On Apr 17, 2015, at 12:53 PM, Jason Giedymin jason.giedy...@gmail.com wrote: What is the last command you have docker doing? If that command exits then the docker will begin to end the container. -Jason On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container
Re: docker based executor
'88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' in container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:28.28676110 slave.cpp:3186] Executor 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has terminated with unknown status mesosslave_1 | I0417 19:07:28.28878410 slave.cpp:2508] Handling status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task mesos-slave1.service.consul-31000 of framework 20150417-190611-2801799596-5050-1- from @0.0.0.0:0 mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 nimbus logs (framework) look like: 2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: task_id { value: mesos-slave1.service.consul-31000 } state: TASK_LOST message: Container terminated slave_id { value: 20150417-190611-2801799596-5050-1-S0 } timestamp: 1.429297648286981E9 source: SOURCE_SLAVE reason: REASON_EXECUTOR_TERMINATED 11: \a\225\245\213\364\207B\342\252\241\242o\346\203N\327
Re: docker based executor
hey Tyson, I've also worked a bit on improving simplifying the mesos-storm framework -- spent the recent Mesosphere hackathon working with tnachen of Mesosphere on this. Nothing deliverable quite yet. We didn't look at dockerization at all, the hacking we did was around these goals: * Avoiding the greedy hoarding of Offers done by the mesos-storm framework (ditching RotatingMap, and only hoarding Offers when there are topologies that need storm worker slots). * Allowing the Mesos UI to distinguish the topologies, by having the Mesos tasks be dedicated to a topology. * Adding usable logging in MesosNimbus. (Some of this work should be usable by other Mesos frameworks, since I'm pretty-printing the Mesos protobuf objects in 1-line JSON instead of bazillion line protobuf toString() pseudo-JSON output. Would be nice to create a library out of it.) Would you like to participate in an offline thread mesos-storm refactoring? Thanks! - Erik On Fri, Apr 17, 2015 at 12:23 PM, Tyson Norris tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' in container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:28.28676110 slave.cpp:3186] Executor 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has terminated with unknown status mesosslave_1 | I0417 19:07
Re: docker based executor
: What is the last command you have docker doing? If that command exits then the docker will begin to end the container. -Jason On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.commailto:tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' in container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:28.28676110 slave.cpp:3186] Executor 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has terminated with unknown status mesosslave_1 | I0417 19:07:28.28878410 slave.cpp:2508] Handling status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task mesos-slave1.service.consul-31000 of framework 20150417-190611-2801799596-5050-1- from @0.0.0.0:0 mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 nimbus logs (framework) look like: 2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: task_id { value: mesos-slave1.service.consul-31000 } state: TASK_LOST message: Container terminated slave_id { value: 20150417-190611-2801799596-5050-1-S0 } timestamp: 1.429297648286981E9 source: SOURCE_SLAVE reason: REASON_EXECUTOR_TERMINATED 11: \a\225\245\213\364\207B\342\252\241\242o\346\203N\327
Re: docker based executor
Hi Erik - Yes these sound like good changes - I am currently focused on just trying to strip things down to be simpler for building versions etc. Specifically I’ve been working on: - don’t distribute config via embedded http server, just send the settings via command args, e.g. -c mesos.master.url=zk://zk1.service.consul:2181/mesos -c storm.zookeeper.servers=[\zk1.service.consul\”] - use docker to ease framework+executor distribution (instead of repacking a storm tarball?) - single container that has storm installation + overlayed lib dir with meson-storm.jar, run it just like storm script: docker run mesos-storm supervisor storm.mesos.MesosSupervisor (use the same container for supervisor executor + nimbus framework container) Currently I stuck on this problem of the executor container dying without any indication why. I only know that it runs whatever container I specify for the executor approx half a second, and then it dies. Tried different containers, and different variants of shell true/false, etc. I haven’t been able to find any examples of running a container as executor, so while it seems like it would make things simpler, its not that way yet. I will be happy to participate in refactoring, feel free to email me offlist. Thanks Tyson On Apr 17, 2015, at 9:18 PM, Erik Weathers eweath...@groupon.commailto:eweath...@groupon.com wrote: hey Tyson, I've also worked a bit on improving simplifying the mesos-storm framework -- spent the recent Mesosphere hackathon working with tnachen of Mesosphere on this. Nothing deliverable quite yet. We didn't look at dockerization at all, the hacking we did was around these goals: * Avoiding the greedy hoarding of Offers done by the mesos-storm framework (ditching RotatingMap, and only hoarding Offers when there are topologies that need storm worker slots). * Allowing the Mesos UI to distinguish the topologies, by having the Mesos tasks be dedicated to a topology. * Adding usable logging in MesosNimbus. (Some of this work should be usable by other Mesos frameworks, since I'm pretty-printing the Mesos protobuf objects in 1-line JSON instead of bazillion line protobuf toString() pseudo-JSON output. Would be nice to create a library out of it.) Would you like to participate in an offline thread mesos-storm refactoring? Thanks! - Erik On Fri, Apr 17, 2015 at 12:23 PM, Tyson Norris tnor...@adobe.commailto:tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611
Re: docker based executor
prematurely after ~0.5 seconds Using the “real” storm command, I get no output in stdout, probably because no output is generated within 0.5 seconds of launch - it is a bit of a pig to startup, so I’m currently just trying to execute some other commands for testing purposes. So I’m guessing this is a timeout issue, or else that the container is reaped inappropriately, or something else… looking through this code, I’m trying to figure out the steps take during executor launch: https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715 Thanks Tyson On Apr 17, 2015, at 12:53 PM, Jason Giedymin jason.giedy...@gmail.com wrote: What is the last command you have docker doing? If that command exits then the docker will begin to end the container. -Jason On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' in container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:28.28676110 slave.cpp:3186] Executor 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has terminated with unknown status mesosslave_1 | I0417 19:07:28.288784
Re: Shenzhen MUG First Meetup !
does huawei use Mesos? On Fri, Apr 17, 2015 at 9:49 PM, Zhipeng Huang zhipengh...@gmail.com wrote: Our first meetup has been anounced ! Please check it out at http://www.meetup.com/Shenzhen-Mesos-User-Group/events/221879815/ . We will have about three topics, and Shenzhen MUG T-shirts :P -- Zhipeng (Howard) Huang Standard Engineer IT Standard Patent/IT Prooduct Line Huawei Technologies Co,. Ltd Email: huangzhip...@huawei.com Office: Huawei Industrial Base, Longgang, Shenzhen (Previous) Research Assistant Mobile Ad-Hoc Network Lab, Calit2 University of California, Irvine Email: zhipe...@uci.edu Office: Calit2 Building Room 2402 OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado -- Best Regards, Haosdent Huang