hey Tyson,

I've also worked a bit on improving & simplifying the mesos-storm framework
-- spent the recent Mesosphere hackathon working with tnachen of Mesosphere
on this.  Nothing deliverable quite yet.

We didn't look at dockerization at all, the hacking we did was around these
goals:
* Avoiding the greedy hoarding of Offers done by the mesos-storm framework
(ditching RotatingMap, and only hoarding Offers when there are topologies
that need storm worker slots).
* Allowing the Mesos UI to distinguish the topologies, by having the Mesos
tasks be dedicated to a topology.
* Adding usable logging in MesosNimbus. (Some of this work should be usable
by other Mesos frameworks, since I'm pretty-printing the Mesos protobuf
objects in 1-line JSON instead of bazillion line protobuf toString()
pseudo-JSON output.  Would be nice to create a library out of it.)

Would you like to participate in an offline thread mesos-storm refactoring?

Thanks!

- Erik

On Fri, Apr 17, 2015 at 12:23 PM, Tyson Norris <tnor...@adobe.com> wrote:

> Hi -
> I am looking at revving the mesos-storm framework to be dockerized (and
> simpler).
> I’m using mesos 0.22.0-1.0.ubuntu1404
> mesos master + mesos slave are deployed in docker containers, in case it
> matters.
>
> I have the storm (nimbus) framework launching fine as a docker container,
> but launching tasks for a topology is having problems related to using a
> docker-based executor.
>
> For example.
>
> TaskInfo task = TaskInfo.newBuilder()
>     .setName("worker " + slot.getNodeId() + ":" + slot.getPort())
>     .setTaskId(taskId)
>     .setSlaveId(offer.getSlaveId())
>     .setExecutor(ExecutorInfo.newBuilder()
>
> .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
>                     .setData(ByteString.copyFromUtf8(executorDataStr))
>                     .setContainer(ContainerInfo.newBuilder()
>                             .setType(ContainerInfo.Type.DOCKER)
>
> .setDocker(ContainerInfo.DockerInfo.newBuilder()
>                                             .setImage("mesos-storm”)))
>
> .setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm
> supervisor storm.mesos.MesosSupervisor"))
>                 //rest is unchanged from existing mesos-storm framework
> code
>
> The executor launches and exits quickly - see the log msg:  Executor for
> container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
>
> It seems like mesos loses track of the executor? I understand there is a 1
> min timeout on registering the executor, but the exit happens well before 1
> minute.
>
> I tried a few alternate commands to experiment, and I can see in the
> stdout for the task that
> "echo testing123 && echo testing456”
> prints to stdout correctly, both testing123 and testing456
>
> however:
> "echo testing123a && sleep 10 && echo testing456a”
> prints only testing123a, presumably because the container is lost and
> destroyed before the sleep time is up.
>
> So it’s like the container for the executor is only allowed to run for .5
> seconds, then it is detected as exited, and the task is lost.
>
> Thanks for any advice.
>
> Tyson
>
>
>
> slave logs look like:
> mesosslave_1  | I0417 19:07:27.461230    11 slave.cpp:1121] Got assigned
> task mesos-slave1.service.consul-31000 for framework
> 20150417-190611-2801799596-5050-1-0000
> mesosslave_1  | I0417 19:07:27.461479    11 slave.cpp:1231] Launching task
> mesos-slave1.service.consul-31000 for framework
> 20150417-190611-2801799596-5050-1-0000
> mesosslave_1  | I0417 19:07:27.463250    11 slave.cpp:4160] Launching
> executor insights-1-1429297638 of framework
> 20150417-190611-2801799596-5050-1-0000 in work directory
> '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
> mesosslave_1  | I0417 19:07:27.463444    11 slave.cpp:1378] Queuing task
> 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of
> framework '20150417-190611-2801799596-5050-1-0000
> mesosslave_1  | I0417 19:07:27.467200     7 docker.cpp:755] Starting
> container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor
> 'insights-1-1429297638' and framework
> '20150417-190611-2801799596-5050-1-0000'
> mesosslave_1  | I0417 19:07:27.985935     7 docker.cpp:1333] Executor for
> container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
> mesosslave_1  | I0417 19:07:27.986359     7 docker.cpp:1159] Destroying
> container '6539127f-9dbb-425b-86a8-845b748f0cd3'
> mesosslave_1  | I0417 19:07:27.986021     9 slave.cpp:3135] Monitoring
> executor 'insights-1-1429297638' of framework
> '20150417-190611-2801799596-5050-1-0000' in container
> '6539127f-9dbb-425b-86a8-845b748f0cd3'
> mesosslave_1  | I0417 19:07:27.986464     7 docker.cpp:1248] Running
> docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
> mesosslave_1  | I0417 19:07:28.286761    10 slave.cpp:3186] Executor
> 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1-0000
> has terminated with unknown status
> mesosslave_1  | I0417 19:07:28.288784    10 slave.cpp:2508] Handling
> status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for
> task mesos-slave1.service.consul-31000 of framework
> 20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0
> mesosslave_1  | W0417 19:07:28.289227     9 docker.cpp:841] Ignoring
> updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3
>
> nimbus logs (framework) look like:
> 2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status
> update: task_id {
>   value: "mesos-slave1.service.consul-31000"
> }
> state: TASK_LOST
> message: "Container terminated"
> slave_id {
>   value: "20150417-190611-2801799596-5050-1-S0"
> }
> timestamp: 1.429297648286981E9
> source: SOURCE_SLAVE
> reason: REASON_EXECUTOR_TERMINATED
> 11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327"
>
>
>
>

Reply via email to