Hi Erik -
Yes these sound like good changes - I am currently focused on just trying to 
strip things down to be simpler for building versions etc.

Specifically I’ve been working on:
- don’t distribute config via embedded http server, just send the settings via 
command args, e.g. -c mesos.master.url=zk://zk1.service.consul:2181/mesos -c 
storm.zookeeper.servers=[\"zk1.service.consul\”]
- use docker to ease framework+executor distribution (instead of repacking a 
storm tarball?) - single container that has storm installation + overlayed lib 
dir with meson-storm.jar, run it just like storm script: docker run mesos-storm 
supervisor storm.mesos.MesosSupervisor  (use the same container for supervisor 
executor + nimbus framework container)

Currently I stuck on this problem of the executor container dying without any 
indication why. I only know that it runs whatever container I specify for the 
executor approx half a second, and then it dies. Tried different containers, 
and different variants of shell true/false, etc. I haven’t been able to find 
any examples of running a container as executor, so while it seems like it 
would make things simpler, its not that way yet.

I will be happy to participate in refactoring, feel free to email me offlist.

Thanks
Tyson


On Apr 17, 2015, at 9:18 PM, Erik Weathers 
<eweath...@groupon.com<mailto:eweath...@groupon.com>> wrote:

hey Tyson,

I've also worked a bit on improving & simplifying the mesos-storm framework -- 
spent the recent Mesosphere hackathon working with tnachen of Mesosphere on 
this.  Nothing deliverable quite yet.

We didn't look at dockerization at all, the hacking we did was around these 
goals:
* Avoiding the greedy hoarding of Offers done by the mesos-storm framework 
(ditching RotatingMap, and only hoarding Offers when there are topologies that 
need storm worker slots).
* Allowing the Mesos UI to distinguish the topologies, by having the Mesos 
tasks be dedicated to a topology.
* Adding usable logging in MesosNimbus. (Some of this work should be usable by 
other Mesos frameworks, since I'm pretty-printing the Mesos protobuf objects in 
1-line JSON instead of bazillion line protobuf toString() pseudo-JSON output.  
Would be nice to create a library out of it.)

Would you like to participate in an offline thread mesos-storm refactoring?

Thanks!

- Erik

On Fri, Apr 17, 2015 at 12:23 PM, Tyson Norris 
<tnor...@adobe.com<mailto:tnor...@adobe.com>> wrote:
Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
    .setName("worker " + slot.getNodeId() + ":" + slot.getPort())
    .setTaskId(taskId)
    .setSlaveId(offer.getSlaveId())
    .setExecutor(ExecutorInfo.newBuilder()
                    
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
                    .setData(ByteString.copyFromUtf8(executorDataStr))
                    .setContainer(ContainerInfo.newBuilder()
                            .setType(ContainerInfo.Type.DOCKER)
                            .setDocker(ContainerInfo.DockerInfo.newBuilder()
                                            .setImage("mesos-storm”)))
                    
.setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm supervisor 
storm.mesos.MesosSupervisor"))
                //rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
"echo testing123 && echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
"echo testing123a && sleep 10 && echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.461230    11 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-0000
mesosslave_1  | I0417 19:07:27.461479    11 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-0000
mesosslave_1  | I0417 19:07:27.463250    11 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1-0000 in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.463444    11 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-0000
mesosslave_1  | I0417 19:07:27.467200     7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-0000'
mesosslave_1  | I0417 19:07:27.985935     7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359     7 docker.cpp:1159] Destroying 
container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986021     9 slave.cpp:3135] Monitoring executor 
'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-0000' 
in container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986464     7 docker.cpp:1248] Running docker 
stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:28.286761    10 slave.cpp:3186] Executor 
'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1-0000 has 
terminated with unknown status
mesosslave_1  | I0417 19:07:28.288784    10 slave.cpp:2508] Handling status 
update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
mesos-slave1.service.consul-31000 of framework 
20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0<http://0.0.0.0:0/>
mesosslave_1  | W0417 19:07:28.289227     9 docker.cpp:841] Ignoring updating 
unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3

nimbus logs (framework) look like:
2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status update: 
task_id {
  value: "mesos-slave1.service.consul-31000"
}
state: TASK_LOST
message: "Container terminated"
slave_id {
  value: "20150417-190611-2801799596-5050-1-S0"
}
timestamp: 1.429297648286981E9
source: SOURCE_SLAVE
reason: REASON_EXECUTOR_TERMINATED
11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327"





Reply via email to