Re: Google Borg paper

2015-04-17 Thread Marco Massenzio
At Google there are always to do everything: the deprecated one and the
one that's not quite ready yet

I'm sure Borg is alive and well (but deprecated) and Omega has been
deployed (but ain't quite ready yet)

They were already working on it in 2010, I'm sure they're still at it.

Will confirm soon as I find out more.
On Apr 16, 2015 9:08 PM, Christos Kozyrakis kozyr...@gmail.com wrote:

 Maxime,
 to the best of my knowledge Borg is still doing just fine at Google. It
 may have been enhanced by the Omega effort but it has not been replaced.
 Nevertheless, I will let any Googlers on the list go into details.
 Christos

 On Thu, Apr 16, 2015 at 4:19 PM, Maxime Brugidou 
 maxime.brugi...@gmail.com wrote:

 Hi,

 Not sure if everyone noticed but Google just published a paper about the
 Borg architecture. I guess it's been replaced by Omega now internally at
 Google (if anyone from Google can confirm?)

 It might be of interest for Mesos :)

 http://research.google.com/pubs/pub43438.html

 Best,
 Maxime




 --
 Christos



Shenzhen MUG First Meetup !

2015-04-17 Thread Zhipeng Huang
Our first meetup has been anounced ! Please check it out at
http://www.meetup.com/Shenzhen-Mesos-User-Group/events/221879815/ . We will
have about three topics, and Shenzhen MUG T-shirts :P

-- 
Zhipeng (Howard) Huang

Standard Engineer
IT Standard  Patent/IT Prooduct Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado


docker based executor

2015-04-17 Thread Tyson Norris
Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler). 
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters. 

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example. 

TaskInfo task = TaskInfo.newBuilder()
.setName(worker  + slot.getNodeId() + : + slot.getPort())
.setTaskId(taskId)
.setSlaveId(offer.getSlaveId())
.setExecutor(ExecutorInfo.newBuilder()

.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
.setData(ByteString.copyFromUtf8(executorDataStr))
.setContainer(ContainerInfo.newBuilder()
.setType(ContainerInfo.Type.DOCKER)
.setDocker(ContainerInfo.DockerInfo.newBuilder()
.setImage(mesos-storm”)))

.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
//rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456” 
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a” 
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost. 

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-'
mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying 
container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 
'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' 
in container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker 
stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor 
'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has 
terminated with unknown status
mesosslave_1  | I0417 19:07:28.28878410 slave.cpp:2508] Handling status 
update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
mesos-slave1.service.consul-31000 of framework 
20150417-190611-2801799596-5050-1- from @0.0.0.0:0
mesosslave_1  | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating 
unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3

nimbus logs (framework) look like:
2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: 
task_id {
  value: mesos-slave1.service.consul-31000
}
state: TASK_LOST
message: Container terminated
slave_id {
  value: 20150417-190611-2801799596-5050-1-S0
}
timestamp: 1.429297648286981E9
source: SOURCE_SLAVE
reason: REASON_EXECUTOR_TERMINATED
11: \a\225\245\213\364\207B\342\252\241\242o\346\203N\327





Re: docker based executor

2015-04-17 Thread Tyson Norris
Yes, agreed that the command should not exit - but the container is killed at 
around 0.5 s after launch regardless of whether the command terminates, which 
is why I’ve been experimenting using commands with varied exit times.

For example, forget about the executor needing to register momentarily.

Using the command:
echo testing123c  sleep 0.1  echo testing456c
- I see the expected output in stdout, and the container is destroyed (as 
expected), because the container exits quickly, and then is destroyed

Using the command:
echo testing123d  sleep 0.6  echo testing456d
- I do NOT see the expected output in stdout (I only get testing123d), because 
the container is destroyed prematurely after ~0.5 seconds

Using the “real” storm command, I get no output in stdout, probably because no 
output is generated within 0.5 seconds of launch - it is a bit of a pig to 
startup, so I’m currently just trying to execute some other commands for 
testing purposes.

So I’m guessing this is a timeout issue, or else that the container is reaped 
inappropriately, or something else… looking through this code, I’m trying to 
figure out the steps take during executor launch:
https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715

Thanks
Tyson





On Apr 17, 2015, at 12:53 PM, Jason Giedymin 
jason.giedy...@gmail.commailto:jason.giedy...@gmail.com wrote:

What is the last command you have docker doing?

If that command exits then the docker will begin to end the container.

-Jason

On Apr 17, 2015, at 3:23 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  .setData(ByteString.copyFromUtf8(executorDataStr))
  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  .setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(mesos-storm”)))
  
.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
  //rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-'
mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying

Re: docker based executor

2015-04-17 Thread Jason Giedymin
What is the last command you have docker doing?

If that command exits then the docker will begin to end the container.

-Jason

 On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Hi -
 I am looking at revving the mesos-storm framework to be dockerized (and 
 simpler). 
 I’m using mesos 0.22.0-1.0.ubuntu1404
 mesos master + mesos slave are deployed in docker containers, in case it 
 matters. 
 
 I have the storm (nimbus) framework launching fine as a docker container, but 
 launching tasks for a topology is having problems related to using a 
 docker-based executor.
 
 For example. 
 
 TaskInfo task = TaskInfo.newBuilder()
.setName(worker  + slot.getNodeId() + : + slot.getPort())
.setTaskId(taskId)
.setSlaveId(offer.getSlaveId())
.setExecutor(ExecutorInfo.newBuilder()

 .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
.setData(ByteString.copyFromUtf8(executorDataStr))
.setContainer(ContainerInfo.newBuilder()
.setType(ContainerInfo.Type.DOCKER)
.setDocker(ContainerInfo.DockerInfo.newBuilder()
.setImage(mesos-storm”)))

 .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm 
 supervisor storm.mesos.MesosSupervisor))
//rest is unchanged from existing mesos-storm framework code
 
 The executor launches and exits quickly - see the log msg:  Executor for 
 container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
 
 It seems like mesos loses track of the executor? I understand there is a 1 
 min timeout on registering the executor, but the exit happens well before 1 
 minute.
 
 I tried a few alternate commands to experiment, and I can see in the stdout 
 for the task that
 echo testing123  echo testing456” 
 prints to stdout correctly, both testing123 and testing456
 
 however:
 echo testing123a  sleep 10  echo testing456a” 
 prints only testing123a, presumably because the container is lost and 
 destroyed before the sleep time is up.
 
 So it’s like the container for the executor is only allowed to run for .5 
 seconds, then it is detected as exited, and the task is lost. 
 
 Thanks for any advice.
 
 Tyson
 
 
 
 slave logs look like:
 mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
 mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
 mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching 
 executor insights-1-1429297638 of framework 
 20150417-190611-2801799596-5050-1- in work directory 
 '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
 framework '20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 
 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-'
 mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
 mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring 
 executor 'insights-1-1429297638' of framework 
 '20150417-190611-2801799596-5050-1-' in container 
 '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker 
 stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor 
 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- 
 has terminated with unknown status
 mesosslave_1  | I0417 19:07:28.28878410 slave.cpp:2508] Handling status 
 update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
 mesos-slave1.service.consul-31000 of framework 
 20150417-190611-2801799596-5050-1- from @0.0.0.0:0
 mesosslave_1  | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating 
 unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3
 
 nimbus logs (framework) look like:
 2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: 
 task_id {
  value: mesos-slave1.service.consul-31000
 }
 state: TASK_LOST
 message: Container terminated
 slave_id {
  value: 20150417-190611-2801799596-5050-1-S0
 }
 timestamp: 1.429297648286981E9
 source: SOURCE_SLAVE
 reason

Re: docker based executor

2015-04-17 Thread Jason Giedymin
Try: 

until something; do
  echo waiting for something to do something
  sleep 5
done

You can put this in a bash file and run that.

If you have a dockerfile would be easier to debug.

-Jason

 On Apr 17, 2015, at 4:24 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Yes, agreed that the command should not exit - but the container is killed at 
 around 0.5 s after launch regardless of whether the command terminates, which 
 is why I’ve been experimenting using commands with varied exit times. 
 
 For example, forget about the executor needing to register momentarily.
 
 Using the command:
 echo testing123c  sleep 0.1  echo testing456c
 - I see the expected output in stdout, and the container is destroyed (as 
 expected), because the container exits quickly, and then is destroyed
 
 Using the command:
 echo testing123d  sleep 0.6  echo testing456d
 - I do NOT see the expected output in stdout (I only get testing123d), 
 because the container is destroyed prematurely after ~0.5 seconds
 
 Using the “real” storm command, I get no output in stdout, probably because 
 no output is generated within 0.5 seconds of launch - it is a bit of a pig to 
 startup, so I’m currently just trying to execute some other commands for 
 testing purposes.
 
 So I’m guessing this is a timeout issue, or else that the container is reaped 
 inappropriately, or something else… looking through this code, I’m trying to 
 figure out the steps take during executor launch:
 https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715
 
 Thanks
 Tyson
   
 
 
 
 
 On Apr 17, 2015, at 12:53 PM, Jason Giedymin jason.giedy...@gmail.com 
 wrote:
 
 What is the last command you have docker doing?
 
 If that command exits then the docker will begin to end the container.
 
 -Jason
 
 On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Hi -
 I am looking at revving the mesos-storm framework to be dockerized (and 
 simpler). 
 I’m using mesos 0.22.0-1.0.ubuntu1404
 mesos master + mesos slave are deployed in docker containers, in case it 
 matters. 
 
 I have the storm (nimbus) framework launching fine as a docker container, 
 but launching tasks for a topology is having problems related to using a 
 docker-based executor.
 
 For example. 
 
 TaskInfo task = TaskInfo.newBuilder()
   .setName(worker  + slot.getNodeId() + : + slot.getPort())
   .setTaskId(taskId)
   .setSlaveId(offer.getSlaveId())
   .setExecutor(ExecutorInfo.newBuilder()
   
 .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
   .setData(ByteString.copyFromUtf8(executorDataStr))
   .setContainer(ContainerInfo.newBuilder()
   .setType(ContainerInfo.Type.DOCKER)
   .setDocker(ContainerInfo.DockerInfo.newBuilder()
   .setImage(mesos-storm”)))
   
 .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm 
 supervisor storm.mesos.MesosSupervisor))
   //rest is unchanged from existing mesos-storm framework code
 
 The executor launches and exits quickly - see the log msg:  Executor for 
 container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
 
 It seems like mesos loses track of the executor? I understand there is a 1 
 min timeout on registering the executor, but the exit happens well before 1 
 minute.
 
 I tried a few alternate commands to experiment, and I can see in the stdout 
 for the task that
 echo testing123  echo testing456” 
 prints to stdout correctly, both testing123 and testing456
 
 however:
 echo testing123a  sleep 10  echo testing456a” 
 prints only testing123a, presumably because the container is lost and 
 destroyed before the sleep time is up.
 
 So it’s like the container for the executor is only allowed to run for .5 
 seconds, then it is detected as exited, and the task is lost. 
 
 Thanks for any advice.
 
 Tyson
 
 
 
 slave logs look like:
 mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned 
 task mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
 mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching 
 executor insights-1-1429297638 of framework 
 20150417-190611-2801799596-5050-1- in work directory 
 '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
 framework '20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting 
 container

Re: docker based executor

2015-04-17 Thread Tyson Norris
 '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-'
mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying 
container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 
'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' 
in container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker 
stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor 
'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has 
terminated with unknown status
mesosslave_1  | I0417 19:07:28.28878410 slave.cpp:2508] Handling status 
update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
mesos-slave1.service.consul-31000 of framework 
20150417-190611-2801799596-5050-1- from @0.0.0.0:0
mesosslave_1  | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating 
unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3

nimbus logs (framework) look like:
2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: 
task_id {
value: mesos-slave1.service.consul-31000
}
state: TASK_LOST
message: Container terminated
slave_id {
value: 20150417-190611-2801799596-5050-1-S0
}
timestamp: 1.429297648286981E9
source: SOURCE_SLAVE
reason: REASON_EXECUTOR_TERMINATED
11: \a\225\245\213\364\207B\342\252\241\242o\346\203N\327







Re: docker based executor

2015-04-17 Thread Erik Weathers
hey Tyson,

I've also worked a bit on improving  simplifying the mesos-storm framework
-- spent the recent Mesosphere hackathon working with tnachen of Mesosphere
on this.  Nothing deliverable quite yet.

We didn't look at dockerization at all, the hacking we did was around these
goals:
* Avoiding the greedy hoarding of Offers done by the mesos-storm framework
(ditching RotatingMap, and only hoarding Offers when there are topologies
that need storm worker slots).
* Allowing the Mesos UI to distinguish the topologies, by having the Mesos
tasks be dedicated to a topology.
* Adding usable logging in MesosNimbus. (Some of this work should be usable
by other Mesos frameworks, since I'm pretty-printing the Mesos protobuf
objects in 1-line JSON instead of bazillion line protobuf toString()
pseudo-JSON output.  Would be nice to create a library out of it.)

Would you like to participate in an offline thread mesos-storm refactoring?

Thanks!

- Erik

On Fri, Apr 17, 2015 at 12:23 PM, Tyson Norris tnor...@adobe.com wrote:

 Hi -
 I am looking at revving the mesos-storm framework to be dockerized (and
 simpler).
 I’m using mesos 0.22.0-1.0.ubuntu1404
 mesos master + mesos slave are deployed in docker containers, in case it
 matters.

 I have the storm (nimbus) framework launching fine as a docker container,
 but launching tasks for a topology is having problems related to using a
 docker-based executor.

 For example.

 TaskInfo task = TaskInfo.newBuilder()
 .setName(worker  + slot.getNodeId() + : + slot.getPort())
 .setTaskId(taskId)
 .setSlaveId(offer.getSlaveId())
 .setExecutor(ExecutorInfo.newBuilder()

 .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
 .setData(ByteString.copyFromUtf8(executorDataStr))
 .setContainer(ContainerInfo.newBuilder()
 .setType(ContainerInfo.Type.DOCKER)

 .setDocker(ContainerInfo.DockerInfo.newBuilder()
 .setImage(mesos-storm”)))

 .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm
 supervisor storm.mesos.MesosSupervisor))
 //rest is unchanged from existing mesos-storm framework
 code

 The executor launches and exits quickly - see the log msg:  Executor for
 container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

 It seems like mesos loses track of the executor? I understand there is a 1
 min timeout on registering the executor, but the exit happens well before 1
 minute.

 I tried a few alternate commands to experiment, and I can see in the
 stdout for the task that
 echo testing123  echo testing456”
 prints to stdout correctly, both testing123 and testing456

 however:
 echo testing123a  sleep 10  echo testing456a”
 prints only testing123a, presumably because the container is lost and
 destroyed before the sleep time is up.

 So it’s like the container for the executor is only allowed to run for .5
 seconds, then it is detected as exited, and the task is lost.

 Thanks for any advice.

 Tyson



 slave logs look like:
 mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned
 task mesos-slave1.service.consul-31000 for framework
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task
 mesos-slave1.service.consul-31000 for framework
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching
 executor insights-1-1429297638 of framework
 20150417-190611-2801799596-5050-1- in work directory
 '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task
 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of
 framework '20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting
 container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor
 'insights-1-1429297638' and framework
 '20150417-190611-2801799596-5050-1-'
 mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for
 container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
 mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying
 container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring
 executor 'insights-1-1429297638' of framework
 '20150417-190611-2801799596-5050-1-' in container
 '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running
 docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor
 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1-
 has terminated with unknown status
 mesosslave_1  | I0417 19:07

Re: docker based executor

2015-04-17 Thread Tyson Norris
:

What is the last command you have docker doing?

If that command exits then the docker will begin to end the container.

-Jason

On Apr 17, 2015, at 3:23 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  .setData(ByteString.copyFromUtf8(executorDataStr))
  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  .setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(mesos-storm”)))
  
.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
  //rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-'
mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying 
container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 
'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' 
in container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker 
stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor 
'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has 
terminated with unknown status
mesosslave_1  | I0417 19:07:28.28878410 slave.cpp:2508] Handling status 
update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
mesos-slave1.service.consul-31000 of framework 
20150417-190611-2801799596-5050-1- from @0.0.0.0:0
mesosslave_1  | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating 
unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3

nimbus logs (framework) look like:
2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: 
task_id {
value: mesos-slave1.service.consul-31000
}
state: TASK_LOST
message: Container terminated
slave_id {
value: 20150417-190611-2801799596-5050-1-S0
}
timestamp: 1.429297648286981E9
source: SOURCE_SLAVE
reason: REASON_EXECUTOR_TERMINATED
11: \a\225\245\213\364\207B\342\252\241\242o\346\203N\327









Re: docker based executor

2015-04-17 Thread Tyson Norris
Hi Erik -
Yes these sound like good changes - I am currently focused on just trying to 
strip things down to be simpler for building versions etc.

Specifically I’ve been working on:
- don’t distribute config via embedded http server, just send the settings via 
command args, e.g. -c mesos.master.url=zk://zk1.service.consul:2181/mesos -c 
storm.zookeeper.servers=[\zk1.service.consul\”]
- use docker to ease framework+executor distribution (instead of repacking a 
storm tarball?) - single container that has storm installation + overlayed lib 
dir with meson-storm.jar, run it just like storm script: docker run mesos-storm 
supervisor storm.mesos.MesosSupervisor  (use the same container for supervisor 
executor + nimbus framework container)

Currently I stuck on this problem of the executor container dying without any 
indication why. I only know that it runs whatever container I specify for the 
executor approx half a second, and then it dies. Tried different containers, 
and different variants of shell true/false, etc. I haven’t been able to find 
any examples of running a container as executor, so while it seems like it 
would make things simpler, its not that way yet.

I will be happy to participate in refactoring, feel free to email me offlist.

Thanks
Tyson


On Apr 17, 2015, at 9:18 PM, Erik Weathers 
eweath...@groupon.commailto:eweath...@groupon.com wrote:

hey Tyson,

I've also worked a bit on improving  simplifying the mesos-storm framework -- 
spent the recent Mesosphere hackathon working with tnachen of Mesosphere on 
this.  Nothing deliverable quite yet.

We didn't look at dockerization at all, the hacking we did was around these 
goals:
* Avoiding the greedy hoarding of Offers done by the mesos-storm framework 
(ditching RotatingMap, and only hoarding Offers when there are topologies that 
need storm worker slots).
* Allowing the Mesos UI to distinguish the topologies, by having the Mesos 
tasks be dedicated to a topology.
* Adding usable logging in MesosNimbus. (Some of this work should be usable by 
other Mesos frameworks, since I'm pretty-printing the Mesos protobuf objects in 
1-line JSON instead of bazillion line protobuf toString() pseudo-JSON output.  
Would be nice to create a library out of it.)

Would you like to participate in an offline thread mesos-storm refactoring?

Thanks!

- Erik

On Fri, Apr 17, 2015 at 12:23 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
.setName(worker  + slot.getNodeId() + : + slot.getPort())
.setTaskId(taskId)
.setSlaveId(offer.getSlaveId())
.setExecutor(ExecutorInfo.newBuilder()

.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
.setData(ByteString.copyFromUtf8(executorDataStr))
.setContainer(ContainerInfo.newBuilder()
.setType(ContainerInfo.Type.DOCKER)
.setDocker(ContainerInfo.DockerInfo.newBuilder()
.setImage(mesos-storm”)))

.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
//rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611

Re: docker based executor

2015-04-17 Thread Jason Giedymin
 prematurely after ~0.5 seconds
 
 Using the “real” storm command, I get no output in stdout, probably 
 because no output is generated within 0.5 seconds of launch - it is a bit 
 of a pig to startup, so I’m currently just trying to execute some other 
 commands for testing purposes.
 
 So I’m guessing this is a timeout issue, or else that the container is 
 reaped inappropriately, or something else… looking through this code, I’m 
 trying to figure out the steps take during executor launch:
 https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715
 
 Thanks
 Tyson
   
 
 
 
 
 On Apr 17, 2015, at 12:53 PM, Jason Giedymin jason.giedy...@gmail.com 
 wrote:
 
 What is the last command you have docker doing?
 
 If that command exits then the docker will begin to end the container.
 
 -Jason
 
 On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Hi -
 I am looking at revving the mesos-storm framework to be dockerized (and 
 simpler). 
 I’m using mesos 0.22.0-1.0.ubuntu1404
 mesos master + mesos slave are deployed in docker containers, in case it 
 matters. 
 
 I have the storm (nimbus) framework launching fine as a docker 
 container, but launching tasks for a topology is having problems related 
 to using a docker-based executor.
 
 For example. 
 
 TaskInfo task = TaskInfo.newBuilder()
   .setName(worker  + slot.getNodeId() + : + slot.getPort())
   .setTaskId(taskId)
   .setSlaveId(offer.getSlaveId())
   .setExecutor(ExecutorInfo.newBuilder()
   
 .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
   .setData(ByteString.copyFromUtf8(executorDataStr))
   .setContainer(ContainerInfo.newBuilder()
   .setType(ContainerInfo.Type.DOCKER)
   
 .setDocker(ContainerInfo.DockerInfo.newBuilder()
   .setImage(mesos-storm”)))
   
 .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm 
 supervisor storm.mesos.MesosSupervisor))
   //rest is unchanged from existing mesos-storm framework code
 
 The executor launches and exits quickly - see the log msg:  Executor for 
 container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
 
 It seems like mesos loses track of the executor? I understand there is a 
 1 min timeout on registering the executor, but the exit happens well 
 before 1 minute.
 
 I tried a few alternate commands to experiment, and I can see in the 
 stdout for the task that
 echo testing123  echo testing456” 
 prints to stdout correctly, both testing123 and testing456
 
 however:
 echo testing123a  sleep 10  echo testing456a” 
 prints only testing123a, presumably because the container is lost and 
 destroyed before the sleep time is up.
 
 So it’s like the container for the executor is only allowed to run for 
 .5 seconds, then it is detected as exited, and the task is lost. 
 
 Thanks for any advice.
 
 Tyson
 
 
 
 slave logs look like:
 mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned 
 task mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching 
 task mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching 
 executor insights-1-1429297638 of framework 
 20150417-190611-2801799596-5050-1- in work directory 
 '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 
 of framework '20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 
 'insights-1-1429297638' and framework 
 '20150417-190611-2801799596-5050-1-'
 mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor 
 for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
 mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring 
 executor 'insights-1-1429297638' of framework 
 '20150417-190611-2801799596-5050-1-' in container 
 '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running 
 docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor 
 'insights-1-1429297638' of framework 
 20150417-190611-2801799596-5050-1- has terminated with unknown status
 mesosslave_1  | I0417 19:07:28.288784

Re: Shenzhen MUG First Meetup !

2015-04-17 Thread haosdent
does huawei use Mesos?

On Fri, Apr 17, 2015 at 9:49 PM, Zhipeng Huang zhipengh...@gmail.com
wrote:

 Our first meetup has been anounced ! Please check it out at
 http://www.meetup.com/Shenzhen-Mesos-User-Group/events/221879815/ . We
 will have about three topics, and Shenzhen MUG T-shirts :P

 --
 Zhipeng (Howard) Huang

 Standard Engineer
 IT Standard  Patent/IT Prooduct Line
 Huawei Technologies Co,. Ltd
 Email: huangzhip...@huawei.com
 Office: Huawei Industrial Base, Longgang, Shenzhen

 (Previous)
 Research Assistant
 Mobile Ad-Hoc Network Lab, Calit2
 University of California, Irvine
 Email: zhipe...@uci.edu
 Office: Calit2 Building Room 2402

 OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado




-- 
Best Regards,
Haosdent Huang