Re: mesos slave in docker container

2015-06-19 Thread Tyson Norris
ugh. Thanks!

I knew this was an issue, and completely ignored the fact that someone changed 
the name…

Thanks - works fine now.

Tyson

On Jun 19, 2015, at 3:39 PM, Brian Devins 
badev...@gmail.commailto:badev...@gmail.com wrote:

You can't name the container mesos-slave. The slave currently sees all 
containers prefixed with 'mesos-' as one it is supposed to administer so it is 
killing itself off since it doesn't match a task that should be running.

On Fri, Jun 19, 2015 at 6:35 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
Hi -
Sorry for the delay, just getting back to this.

Below is the command and stdout I get.

I tried specifying just mesos containerizes, as this person mentioned 
https://github.com/mesosphere/coreos-setup/issues/5
and had similar results - works fine with mesos containerized, but not docker.

Also I am only seeing this failing on a RHEL 7 + docker containerizer, works 
fine on ubuntu 14 with docker containerizer.

Thanks
Tyson


[root@phx-8 ~]# docker run --rm -it  \
 --name mesos-slave \
 --net host \
 --pid host \
 --privileged \
 --env MESOS_CONTAINERIZERS=docker \
 --env MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins \
 --env MESOS_HOSTNAME=192.168.8.8 \
 --env MESOS_IP=192.168.8.8 \
 --env MESOS_LOG_DIR=/var/log/mesos \
 --env MESOS_LOGGING_LEVEL=INFO \
 --env 
 MESOS_MASTER=zk://zk1.service.consul:2181,zk2.service.consul:2181,zk3.service.consul:2181/mesos
  \
 --env SERVICE_5051_NAME=mesos-slave \
 --env 
 MESOS_DOCKER_MESOS_IMAGE=docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404http://docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404
  \
 --env GLOG_v=1 \
 --volume /var/run/docker.sock:/var/run/docker.sock \
 --volume /sys:/sys:ro \
 -p 0.0.0.0:5051:5051 \
 --entrypoint mesos-slave \
 
 docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404http://docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0619 22:32:37.081535  8362 process.cpp:961] libprocess is initialized on 
192.168.8.8:5051http://192.168.8.8:5051/ for 8 cpus
I0619 22:32:37.160727  8362 logging.cpp:172] INFO level logging started!
I0619 22:32:37.161571  8362 logging.cpp:177] Logging to /var/log/mesos
I0619 22:32:37.161609  8362 main.cpp:156] Build: 2015-05-05 06:15:50 by root
I0619 22:32:37.161629  8362 main.cpp:158] Version: 0.22.1
I0619 22:32:37.161639  8362 main.cpp:161] Git tag: 0.22.1
I0619 22:32:37.161650  8362 main.cpp:165] Git SHA: 
d6309f92a7f9af3ab61a878403e3d9c284ea87e0
2015-06-19 22:32:40,196:8362(0x7f38e66b0700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
2015-06-19 22:32:40,196:8362(0x7f38e66b0700):ZOO_INFO@log_env@716: Client 
environment:host.namehttp://host.name/=phx-8.corp.adobe.comhttp://phx-8.corp.adobe.com/
2015-06-19 22:32:40,196:8362(0x7f38e66b0700):ZOO_INFO@log_env@723: Client 
environment:os.namehttp://os.name/=Linux
2015-06-19 22:32:40,196:8362(0x7f38e66b0700):ZOO_INFO@log_env@724: Client 
environment:os.arch=3.10.0-123.el7.x86_64
2015-06-19 22:32:40,196:8362(0x7f38e66b0700):ZOO_INFO@log_env@725: Client 
environment:os.version=#1 SMP Mon May 5 11:16:57 EDT 2014
I0619 22:32:40.196564  8362 main.cpp:200] Starting Mesos slave
I0619 22:32:40.203459  8362 slave.cpp:174] Slave started on 
1)@192.168.8.8:5051http://192.168.8.8:5051/
I0619 22:32:40.205621  8362 slave.cpp:322] Slave resources: cpus(*):4; 
mem(*):14864; disk(*):4975; ports(*):[31000-32000]
I0619 22:32:40.206074  8362 slave.cpp:351] Slave hostname: 192.168.8.8
I0619 22:32:40.206116  8362 slave.cpp:352] Slave checkpoint: true
2015-06-19 22:32:40,208:8362(0x7f38e66b0700):ZOO_INFO@log_env@733: Client 
environment:user.namehttp://user.name/=(null)
2015-06-19 22:32:40,208:8362(0x7f38e66b0700):ZOO_INFO@log_env@741: Client 
environment:user.home=/root
2015-06-19 22:32:40,208:8362(0x7f38e66b0700):ZOO_INFO@log_env@753: Client 
environment:user.dir=/
2015-06-19 22:32:40,208:8362(0x7f38e66b0700):ZOO_INFO@zookeeper_init@786: 
Initiating client connection, 
host=zk1.service.consul:2181,zk2.service.consul:2181,zk3.service.consul:2181 
sessionTimeout=1 watcher=0x7f38ea110a60 sessionId=0 sessionPasswd=null 
context=0x7f38d4000ea0 flags=0
I0619 22:32:40.208984  8367 state.cpp:35] Recovering state from 
'/tmp/mesos/meta'
I0619 22:32:40.209102  8367 slave.cpp:600] Successfully attached file 
'/var/log/mesos/mesos-slave.INFO'
I0619 22:32:40.209174  8367 status_update_manager.cpp:197] Recovering status 
update manager
I0619 22:32:40.230962  8367 docker.cpp:423] Recovering Docker containers
I0619 22:32:40.231061  8367 docker.cpp:697] Running docker ps -a
2015-06-19 22:32:40,252:8362(0x7f38e1a4b700):ZOO_INFO@check_events@1703: 
initiated connection to server [192.168.8.3:2181http://192.168.8.3:2181/]
2015-06-19 22:32:40,256:8362

mesos slave in docker container

2015-06-13 Thread Tyson Norris
Hi - 
We are running mesos slave (0.22.0-1.0.ubuntu1404) in a docker container with 
docker containerizer without problems on ubuntu 14.04 docker host (with 
lxc-docker pkg etc added). 

Running the same slave container on RHEL 7.0 docker host, the container exits 
almost immediately after starting with:
I0613 07:18:15.161931  5303 slave.cpp:3808] Finished recovery
I0613 07:18:15.162677  5303 slave.cpp:647] New master detected at 
master@192.168.8.5:5050
I0613 07:18:15.162753  5301 status_update_manager.cpp:171] Pausing sending 
status updates
I0613 07:18:15.163051  5303 slave.cpp:672] No credentials provided. Attempting 
to register without authentication
I0613 07:18:15.163734  5303 slave.cpp:683] Detecting new master
W0613 07:18:15.163734  5293 logging.cpp:81] RAW: Received signal SIGTERM from 
process 1166 of user 0; exiting


If I do not enable the docker containerized, the slave container runs fine. 

Other containers that bind mount /var/run/docker.sock also run fine. 

Debug docker logs are below. 

One difference between the ubuntu docker host and RHEL docker host is that the 
unbuntu host uses the aufs driver, while rhel uses devicemapper, and selinux is 
enabled in RHEL but not ubuntu.

Thanks for any advice!
Tyson




Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=info msg=POST 
/v1.18/containers/9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd/start
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=info msg=+job 
start(9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd)
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=activateDeviceIfNeeded(9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd)
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm info 
docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
  OF   [16384] (*1)
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm create 
docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
  OF   [16384] (*1)
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=libdevmapper(6): libdm-common.c:1348 (4) 
docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd:
 Stacking NODE_ADD (253,9) 0:0 0600 [verify_udev]
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm reload 
docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
  OF   [16384] (*1)
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm resume 
docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
  OF   [16384] (*1)
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=libdevmapper(6): libdm-common.c:1348 (4) 
docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd:
 Processing NODE_ADD (253,9) 0:0 0600 [verify_udev]
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=libdevmapper(6): libdm-common.c:983 (4) Created 
/dev/mapper/docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
Jun 13 07:28:26 phx-8 kernel: EXT4-fs (dm-9): mounted filesystem with ordered 
data mode. Opts: discard
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=info msg=+job 
log(start, 9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd, 
docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404)
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=info msg=-job 
log(start, 9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd, 
docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404) = OK (0)
Jun 13 07:28:26 phx-8 systemd-udevd: conflicting device node 
'/dev/mapper/docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd'
 found, link to '/dev/dm-9' will not be created
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=debug 
msg=Calling GET /containers/{name:.*}/json
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=info msg=GET 
/containers/9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd/json
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=info msg=+job 
container_inspect(9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd)
Jun 13 07:28:26 phx-8 systemd: Starting docker container 
9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd.
Jun 13 07:28:26 phx-8 systemd: Started docker container 
9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd.
Jun 13 07:28:26 phx-8 docker: time=2015-06-13T07:28:26Z level=info msg=-job 
start(9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd) = OK 

Re: docker based executor

2015-04-19 Thread Tyson Norris
Ah, after reading some info at 
https://tnachen.wordpress.com/2014/08/19/docker-in-mesos-0-20/
I see that I should probably be setting my slave container to run with 
--net=host - with that it is working now.

Are the changes for https://issues.apache.org/jira/browse/MESOS-2183 going to 
allow slave+executor to run with --net=bridge?

Thanks!
Tyson

On Apr 18, 2015, at 10:43 PM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

Hi Tyson,

Glad you figured it out, sorry didn't realize you were running mesos slave in a 
docker (which surely complicates things).

 I have a series of patches that is pending to be merged that will also make 
recovering tasks when relaunching mesos-slave in a docker works. Currently even 
with --pid=host when your slave dies your tasks are not able to recover when it 
restarts.

Tim

On Sat, Apr 18, 2015 at 10:32 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
Yes, this was the problem - sorry for the noise.

For the record, running mesos-slave in a container requires --pid=host” option 
as mentioned in MESOS-2183

Now if docker-compose would just get released with the support for setting pid 
flag, life would be easy...

Thanks
Tyson

On Apr 18, 2015, at 9:48 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

I think I may be running into this: 
https://issues.apache.org/jira/browse/MESOS-2183

I’m trying to get docker-compose to launch slave with --pid=host, but having a 
few separate problems with that.

I will update this thread when I’m able to test that.

Thanks
Tyson

On Apr 18, 2015, at 1:14 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi Tim - Actually, rereading your email: For a test image like this you want 
to set the CommandInfo with a ContainerInfo holding the docker image instead.” 
it sounds like you are suggesting running the container as a task command? But 
part of what I’m doing is trying to provide a custom executor, so I think what 
I had before is appropriate - eventually I want to make the tasks launch (same 
e.g. similar to existing mesos-storm framework), but I am trying to launch the 
executor as a container instead of a script command, which I think should be 
possible.

So maybe you can comment on using a container within an ExecutorInfo as below?
Docs here: 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L267
suggest that ContainerInfo and CommandInfo should be provided - I am using 
setShell(false) to avoid changing the entry point, which already uses the 
default /bin/sh -c”.


Thanks
Tyson


On Apr 18, 2015, at 1:03 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi Tim -
I am using my own framework - a modified version of mesos-storm, attempting to 
use docker containers instead of

TaskInfo is like:
  TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  
.setData(ByteString.copyFromUtf8(executorDataStr))
  .setCommand(CommandInfo.newBuilder()
  .setShell(false)
  )

  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  
.setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(testexecutor)
  )
)

I understand this test image will be expected to fail  - I expect it to fail by 
registration timeout, and not by simply dying though. I’m only using a test 
image, because I see the same behavior with my actual image that properly 
handles mesos - executor registration protocol.

I will try moving the Container inside the Command, and see if it survives 
longer.

I see now at 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L675
it mentions Either ExecutorInfo or CommandInfo should be set”

Thanks
Tyson


On Apr 18, 2015, at 12:38 PM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

That does seems odd, how did you run this via mesos? Are you using your own 
framework or through another framework like Marathon?

And what does the TaskInfo look like?

Also note that if you're just testing a container, you don't want to set the 
ExecutorInfo with a command as Executors in Mesos are expected to communicate 
back to Mesos slave and implement the protocol between mesos and executor. For 
a test image like this you want to set the CommandInfo with a ContainerInfo 
holding the docker image instead

Re: docker based executor

2015-04-18 Thread Tyson Norris
Hi Tim -
Yes, I mentioned below when using a script like:
--
#!/bin/bash
until false; do
  echo waiting for something to do something
  sleep 0.2
done
--

In my sandbox stdout I get exactly 2 lines:
waiting for something to do something
waiting for something to do something

Running this container any other way, e.g. docker run --rm -it testexecutor, 
the output is an endless stream of waiting for something to do something”.

So something is stopping the container, as opposed to the container just 
exiting; at least that’s how it looks - I only get the container to stop when 
it is launched as an executor.

Also, based on the docker logs, something is calling the /container/id/stop 
endpoint, *before* the /container/id/logs endpoint - so the stop is arriving 
before the logs are tailed, which also seems incorrect, and suggests that there 
is some code explicating stopping the container, instead of the container 
exiting itself.

Thanks
Tyson



On Apr 18, 2015, at 3:33 AM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

Hi Tyson,

The error message you saw in the logs about the executor exited actually just 
means the executor process has exited.

Since you're launching a custom executor with MesosSupervisor, it seems like 
MesosSupervisor simply exited without reporting any task status.

Can you look at what's the actual logs of the container? They can be found in 
the sandbox stdout and stderr logs.

Tim

On Fri, Apr 17, 2015 at 11:16 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
The sequence I see in the docker.log when my executor is launched is something 
like:
GET /containers/id/json
POST /containers/id/wait
POST /containers/id/stop
GET /containers/id/logs

So I’m wondering if the slave is calling docker-stop out of order in 
slave/containerizer/docker.cpp
I only see it being called in recover and destroy and I don’t see logs 
indicating either of those happening, but I may be missing something else

Tyson

On Apr 17, 2015, at 9:42 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

mesos master INFO log says:
I0418 04:26:31.573763 6 master.cpp:3755] Sending 1 offers to framework 
20150411-165219-771756460-5050-1- (marathon) at 
scheduler-8b8d994e-5881-4687-81eb-5b3694c66342@172.17.1.34mailto:8b8d994e-5881-4687-81eb-5b3694c66342@172.17.1.34:44364
I0418 04:26:31.580003 9 master.cpp:2268] Processing ACCEPT call for offers: 
[ 20150418-041001-553718188-5050-1-O165 ] on slave 
20150418-041001-553718188-5050-1-S0 at 
slave(1)@172.17.1.35:5051http://172.17.1.35:5051/ 
(mesos-slave1.service.consul) for framework 
20150411-165219-771756460-5050-1- (marathon) at 
scheduler-8b8d994e-5881-4687-81eb-5b3694c66342@172.17.1.34mailto:scheduler-8b8d994e-5881-4687-81eb-5b3694c66342@172.17.1.34:44364
I0418 04:26:31.580369 9 hierarchical.hpp:648] Recovered cpus(*):6; 
mem(*):3862; disk(*):13483; ports(*):[31001-32000] (total allocatable: 
cpus(*):6; mem(*):3862; disk(*):13483; ports(*):[31001-32000]) on slave 
20150418-041001-553718188-5050-1-S0 from framework 
20150411-165219-771756460-5050-1-
I0418 04:26:32.48003612 master.cpp:3388] Executor insights-1-1429330829 of 
framework 20150418-041001-553718188-5050-1-0001 on slave 
20150418-041001-553718188-5050-1-S0 at 
slave(1)@172.17.1.35:5051http://172.17.1.35:5051/ 
(mesos-slave1.service.consul) terminated with signal Unknown signal 127

mesos slave  INFO log says:
I0418 04:26:31.390650 8 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150418-041001-553718188-5050-1-0001
I0418 04:26:31.392432 8 slave.cpp:4160] Launching executor 
insights-1-1429330829 of framework 20150418-041001-553718188-5050-1-0001 in 
work directory '/tmp/mesos/slaves/20150418-041001-553718188-5050-
1-S0/frameworks/20150418-041001-553718188-5050-1-0001/executors/insights-1-1429330829/runs/3cc411b0-c2e0-41ae-80c2-f0306371da5a'
I0418 04:26:31.392587 8 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429330829 of 
framework '20150418-041001-553718188-5050-1-0001
I0418 04:26:31.397415 7 docker.cpp:755] Starting container 
'3cc411b0-c2e0-41ae-80c2-f0306371da5a' for executor 'insights-1-1429330829' and 
framework '20150418-041001-553718188-5050-1-0001'
I0418 04:26:31.397835 7 fetcher.cpp:238] Fetching URIs using command 
'/usr/libexec/mesos/mesos-fetcher'
I0418 04:26:32.17747911 docker.cpp:1333] Executor for container 
'3cc411b0-c2e0-41ae-80c2-f0306371da5a' has exited
I0418 04:26:32.17781711 docker.cpp:1159] Destroying container 
'3cc411b0-c2e0-41ae-80c2-f0306371da5a'
I0418 04:26:32.17799911 docker.cpp:1248] Running docker stop on container 
'3cc411b0-c2e0-41ae-80c2-f0306371da5a'
I0418 04:26:32.177620 6 slave.cpp:3135] Monitoring executor 
'insights-1-1429330829' of framework '20150418-041001-553718188-5050-1-0001' in 
container '3cc411b0-c2e0

Re: docker based executor

2015-04-18 Thread Tyson Norris
Hi Tim -
I am using my own framework - a modified version of mesos-storm, attempting to 
use docker containers instead of

TaskInfo is like:
  TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  
.setData(ByteString.copyFromUtf8(executorDataStr))
  .setCommand(CommandInfo.newBuilder()
  .setShell(false)
  )

  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  
.setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(testexecutor)
  )
)

I understand this test image will be expected to fail  - I expect it to fail by 
registration timeout, and not by simply dying though. I’m only using a test 
image, because I see the same behavior with my actual image that properly 
handles mesos - executor registration protocol.

I will try moving the Container inside the Command, and see if it survives 
longer.

I see now at 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L675
it mentions Either ExecutorInfo or CommandInfo should be set”

Thanks
Tyson


On Apr 18, 2015, at 12:38 PM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

That does seems odd, how did you run this via mesos? Are you using your own 
framework or through another framework like Marathon?

And what does the TaskInfo look like?

Also note that if you're just testing a container, you don't want to set the 
ExecutorInfo with a command as Executors in Mesos are expected to communicate 
back to Mesos slave and implement the protocol between mesos and executor. For 
a test image like this you want to set the CommandInfo with a ContainerInfo 
holding the docker image instead.

Tim

On Sat, Apr 18, 2015 at 12:17 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
Hi Tim -
Yes, I mentioned below when using a script like:
--
#!/bin/bash
until false; do
  echo waiting for something to do something
  sleep 0.2
done
--

In my sandbox stdout I get exactly 2 lines:
waiting for something to do something
waiting for something to do something

Running this container any other way, e.g. docker run --rm -it testexecutor, 
the output is an endless stream of waiting for something to do something”.

So something is stopping the container, as opposed to the container just 
exiting; at least that’s how it looks - I only get the container to stop when 
it is launched as an executor.

Also, based on the docker logs, something is calling the /container/id/stop 
endpoint, *before* the /container/id/logs endpoint - so the stop is arriving 
before the logs are tailed, which also seems incorrect, and suggests that there 
is some code explicating stopping the container, instead of the container 
exiting itself.

Thanks
Tyson



On Apr 18, 2015, at 3:33 AM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

Hi Tyson,

The error message you saw in the logs about the executor exited actually just 
means the executor process has exited.

Since you're launching a custom executor with MesosSupervisor, it seems like 
MesosSupervisor simply exited without reporting any task status.

Can you look at what's the actual logs of the container? They can be found in 
the sandbox stdout and stderr logs.

Tim

On Fri, Apr 17, 2015 at 11:16 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
The sequence I see in the docker.log when my executor is launched is something 
like:
GET /containers/id/json
POST /containers/id/wait
POST /containers/id/stop
GET /containers/id/logs

So I’m wondering if the slave is calling docker-stop out of order in 
slave/containerizer/docker.cpp
I only see it being called in recover and destroy and I don’t see logs 
indicating either of those happening, but I may be missing something else

Tyson

On Apr 17, 2015, at 9:42 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

mesos master INFO log says:
I0418 04:26:31.573763 6 master.cpp:3755] Sending 1 offers to framework 
20150411-165219-771756460-5050-1- (marathon) at 
scheduler-8b8d994e-5881-4687-81eb-5b3694c66342@172.17.1.34mailto:8b8d994e-5881-4687-81eb-5b3694c66342@172.17.1.34:44364
I0418 04:26:31.580003 9 master.cpp:2268] Processing ACCEPT call for offers: 
[ 20150418-041001-553718188-5050-1-O165 ] on slave 
20150418-041001

Re: docker based executor

2015-04-18 Thread Tyson Norris
Hi Tim - Actually, rereading your email: For a test image like this you want 
to set the CommandInfo with a ContainerInfo holding the docker image instead.” 
it sounds like you are suggesting running the container as a task command? But 
part of what I’m doing is trying to provide a custom executor, so I think what 
I had before is appropriate - eventually I want to make the tasks launch (same 
e.g. similar to existing mesos-storm framework), but I am trying to launch the 
executor as a container instead of a script command, which I think should be 
possible.

So maybe you can comment on using a container within an ExecutorInfo as below?
Docs here: 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L267
suggest that ContainerInfo and CommandInfo should be provided - I am using 
setShell(false) to avoid changing the entry point, which already uses the 
default /bin/sh -c”.


Thanks
Tyson


On Apr 18, 2015, at 1:03 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi Tim -
I am using my own framework - a modified version of mesos-storm, attempting to 
use docker containers instead of

TaskInfo is like:
  TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  
.setData(ByteString.copyFromUtf8(executorDataStr))
  .setCommand(CommandInfo.newBuilder()
  .setShell(false)
  )

  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  
.setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(testexecutor)
  )
)

I understand this test image will be expected to fail  - I expect it to fail by 
registration timeout, and not by simply dying though. I’m only using a test 
image, because I see the same behavior with my actual image that properly 
handles mesos - executor registration protocol.

I will try moving the Container inside the Command, and see if it survives 
longer.

I see now at 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L675
it mentions Either ExecutorInfo or CommandInfo should be set”

Thanks
Tyson


On Apr 18, 2015, at 12:38 PM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

That does seems odd, how did you run this via mesos? Are you using your own 
framework or through another framework like Marathon?

And what does the TaskInfo look like?

Also note that if you're just testing a container, you don't want to set the 
ExecutorInfo with a command as Executors in Mesos are expected to communicate 
back to Mesos slave and implement the protocol between mesos and executor. For 
a test image like this you want to set the CommandInfo with a ContainerInfo 
holding the docker image instead.

Tim

On Sat, Apr 18, 2015 at 12:17 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
Hi Tim -
Yes, I mentioned below when using a script like:
--
#!/bin/bash
until false; do
  echo waiting for something to do something
  sleep 0.2
done
--

In my sandbox stdout I get exactly 2 lines:
waiting for something to do something
waiting for something to do something

Running this container any other way, e.g. docker run --rm -it testexecutor, 
the output is an endless stream of waiting for something to do something”.

So something is stopping the container, as opposed to the container just 
exiting; at least that’s how it looks - I only get the container to stop when 
it is launched as an executor.

Also, based on the docker logs, something is calling the /container/id/stop 
endpoint, *before* the /container/id/logs endpoint - so the stop is arriving 
before the logs are tailed, which also seems incorrect, and suggests that there 
is some code explicating stopping the container, instead of the container 
exiting itself.

Thanks
Tyson



On Apr 18, 2015, at 3:33 AM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

Hi Tyson,

The error message you saw in the logs about the executor exited actually just 
means the executor process has exited.

Since you're launching a custom executor with MesosSupervisor, it seems like 
MesosSupervisor simply exited without reporting any task status.

Can you look at what's the actual logs of the container? They can be found in 
the sandbox stdout and stderr logs.

Tim

On Fri, Apr 17, 2015 at 11:16 PM, Tyson Norris

Re: docker based executor

2015-04-18 Thread Tyson Norris
I think I may be running into this: 
https://issues.apache.org/jira/browse/MESOS-2183

I’m trying to get docker-compose to launch slave with --pid=host, but having a 
few separate problems with that.

I will update this thread when I’m able to test that.

Thanks
Tyson

On Apr 18, 2015, at 1:14 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi Tim - Actually, rereading your email: For a test image like this you want 
to set the CommandInfo with a ContainerInfo holding the docker image instead.” 
it sounds like you are suggesting running the container as a task command? But 
part of what I’m doing is trying to provide a custom executor, so I think what 
I had before is appropriate - eventually I want to make the tasks launch (same 
e.g. similar to existing mesos-storm framework), but I am trying to launch the 
executor as a container instead of a script command, which I think should be 
possible.

So maybe you can comment on using a container within an ExecutorInfo as below?
Docs here: 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L267
suggest that ContainerInfo and CommandInfo should be provided - I am using 
setShell(false) to avoid changing the entry point, which already uses the 
default /bin/sh -c”.


Thanks
Tyson


On Apr 18, 2015, at 1:03 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi Tim -
I am using my own framework - a modified version of mesos-storm, attempting to 
use docker containers instead of

TaskInfo is like:
  TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  
.setData(ByteString.copyFromUtf8(executorDataStr))
  .setCommand(CommandInfo.newBuilder()
  .setShell(false)
  )

  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  
.setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(testexecutor)
  )
)

I understand this test image will be expected to fail  - I expect it to fail by 
registration timeout, and not by simply dying though. I’m only using a test 
image, because I see the same behavior with my actual image that properly 
handles mesos - executor registration protocol.

I will try moving the Container inside the Command, and see if it survives 
longer.

I see now at 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L675
it mentions Either ExecutorInfo or CommandInfo should be set”

Thanks
Tyson


On Apr 18, 2015, at 12:38 PM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

That does seems odd, how did you run this via mesos? Are you using your own 
framework or through another framework like Marathon?

And what does the TaskInfo look like?

Also note that if you're just testing a container, you don't want to set the 
ExecutorInfo with a command as Executors in Mesos are expected to communicate 
back to Mesos slave and implement the protocol between mesos and executor. For 
a test image like this you want to set the CommandInfo with a ContainerInfo 
holding the docker image instead.

Tim

On Sat, Apr 18, 2015 at 12:17 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
Hi Tim -
Yes, I mentioned below when using a script like:
--
#!/bin/bash
until false; do
  echo waiting for something to do something
  sleep 0.2
done
--

In my sandbox stdout I get exactly 2 lines:
waiting for something to do something
waiting for something to do something

Running this container any other way, e.g. docker run --rm -it testexecutor, 
the output is an endless stream of waiting for something to do something”.

So something is stopping the container, as opposed to the container just 
exiting; at least that’s how it looks - I only get the container to stop when 
it is launched as an executor.

Also, based on the docker logs, something is calling the /container/id/stop 
endpoint, *before* the /container/id/logs endpoint - so the stop is arriving 
before the logs are tailed, which also seems incorrect, and suggests that there 
is some code explicating stopping the container, instead of the container 
exiting itself.

Thanks
Tyson



On Apr 18, 2015, at 3:33 AM, Tim Chen 
t...@mesosphere.iomailto:t...@mesosphere.io wrote:

Hi Tyson,

The error message you saw in the logs about the executor exited

docker based executor

2015-04-17 Thread Tyson Norris
Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler). 
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters. 

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example. 

TaskInfo task = TaskInfo.newBuilder()
.setName(worker  + slot.getNodeId() + : + slot.getPort())
.setTaskId(taskId)
.setSlaveId(offer.getSlaveId())
.setExecutor(ExecutorInfo.newBuilder()

.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
.setData(ByteString.copyFromUtf8(executorDataStr))
.setContainer(ContainerInfo.newBuilder()
.setType(ContainerInfo.Type.DOCKER)
.setDocker(ContainerInfo.DockerInfo.newBuilder()
.setImage(mesos-storm”)))

.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
//rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456” 
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a” 
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost. 

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-'
mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying 
container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 
'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' 
in container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker 
stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor 
'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has 
terminated with unknown status
mesosslave_1  | I0417 19:07:28.28878410 slave.cpp:2508] Handling status 
update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
mesos-slave1.service.consul-31000 of framework 
20150417-190611-2801799596-5050-1- from @0.0.0.0:0
mesosslave_1  | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating 
unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3

nimbus logs (framework) look like:
2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: 
task_id {
  value: mesos-slave1.service.consul-31000
}
state: TASK_LOST
message: Container terminated
slave_id {
  value: 20150417-190611-2801799596-5050-1-S0
}
timestamp: 1.429297648286981E9
source: SOURCE_SLAVE
reason: REASON_EXECUTOR_TERMINATED
11: \a\225\245\213\364\207B\342\252\241\242o\346\203N\327





Re: docker based executor

2015-04-17 Thread Tyson Norris
Yes, agreed that the command should not exit - but the container is killed at 
around 0.5 s after launch regardless of whether the command terminates, which 
is why I’ve been experimenting using commands with varied exit times.

For example, forget about the executor needing to register momentarily.

Using the command:
echo testing123c  sleep 0.1  echo testing456c
- I see the expected output in stdout, and the container is destroyed (as 
expected), because the container exits quickly, and then is destroyed

Using the command:
echo testing123d  sleep 0.6  echo testing456d
- I do NOT see the expected output in stdout (I only get testing123d), because 
the container is destroyed prematurely after ~0.5 seconds

Using the “real” storm command, I get no output in stdout, probably because no 
output is generated within 0.5 seconds of launch - it is a bit of a pig to 
startup, so I’m currently just trying to execute some other commands for 
testing purposes.

So I’m guessing this is a timeout issue, or else that the container is reaped 
inappropriately, or something else… looking through this code, I’m trying to 
figure out the steps take during executor launch:
https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715

Thanks
Tyson





On Apr 17, 2015, at 12:53 PM, Jason Giedymin 
jason.giedy...@gmail.commailto:jason.giedy...@gmail.com wrote:

What is the last command you have docker doing?

If that command exits then the docker will begin to end the container.

-Jason

On Apr 17, 2015, at 3:23 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  .setData(ByteString.copyFromUtf8(executorDataStr))
  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  .setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(mesos-storm”)))
  
.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
  //rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in 
work directory 
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
framework '20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting container 
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and 
framework '20150417-190611-2801799596-5050-1-'
mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying

Re: docker based executor

2015-04-17 Thread Tyson Norris
You can reproduce with most any dockerfile, I think - it seems like launching a 
customer executor that is a docker container has some problem.

I just made a simple test with docker file:
--
#this is oracle java8 atop phusion baseimage
FROM opentable/baseimage-java8:latest


#mesos lib (not used here, but will be in our “real” executor, e.g. to register 
the executor etc)
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF
RUN echo deb http://repos.mesosphere.io/$(lsb_release -is | tr '[:upper:]' 
'[:lower:]') $(lsb_release -cs) main | tee 
/etc/apt/sources.list.d/mesosphere.list
RUN cat /etc/apt/sources.list.d/mesosphere.list
RUN apt-get update  apt-get install -y \
mesos

ADD script.sh /usr/bin/executor-script.sh

CMD executor-script.sh
--

and script.sh:
--
#!/bin/bash
until false; do
  echo waiting for something to do something
  sleep 0.2
done
--

And in my stdout I get exactly 2 lines:
waiting for something to do something
waiting for something to do something

Which is how many lines can be output in within 0.5 seconds…something is fishy 
about the 0.5 seconds, but I’m not sure where.

I’m not sure exactly the difference, but launching a docker container as a task 
WITHOUT a custom executor works fine, and I’m not sure about launching a docker 
container as a task that is using a non-docker custom executor. The case I’m 
trying for is using a docker customer executor, and launching non-docker tasks. 
(in case that helps clarify the situation).

Thanks
Tyson





On Apr 17, 2015, at 1:47 PM, Jason Giedymin 
jason.giedy...@gmail.commailto:jason.giedy...@gmail.com wrote:

Try:


until something; do
  echo waiting for something to do something
  sleep 5
done

You can put this in a bash file and run that.

If you have a dockerfile would be easier to debug.

-Jason

On Apr 17, 2015, at 4:24 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Yes, agreed that the command should not exit - but the container is killed at 
around 0.5 s after launch regardless of whether the command terminates, which 
is why I’ve been experimenting using commands with varied exit times.

For example, forget about the executor needing to register momentarily.

Using the command:
echo testing123c  sleep 0.1  echo testing456c
- I see the expected output in stdout, and the container is destroyed (as 
expected), because the container exits quickly, and then is destroyed

Using the command:
echo testing123d  sleep 0.6  echo testing456d
- I do NOT see the expected output in stdout (I only get testing123d), because 
the container is destroyed prematurely after ~0.5 seconds

Using the “real” storm command, I get no output in stdout, probably because no 
output is generated within 0.5 seconds of launch - it is a bit of a pig to 
startup, so I’m currently just trying to execute some other commands for 
testing purposes.

So I’m guessing this is a timeout issue, or else that the container is reaped 
inappropriately, or something else… looking through this code, I’m trying to 
figure out the steps take during executor launch:
https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715

Thanks
Tyson





On Apr 17, 2015, at 12:53 PM, Jason Giedymin 
jason.giedy...@gmail.commailto:jason.giedy...@gmail.com wrote:

What is the last command you have docker doing?

If that command exits then the docker will begin to end the container.

-Jason

On Apr 17, 2015, at 3:23 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
  .setName(worker  + slot.getNodeId() + : + slot.getPort())
  .setTaskId(taskId)
  .setSlaveId(offer.getSlaveId())
  .setExecutor(ExecutorInfo.newBuilder()
  
.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
  .setData(ByteString.copyFromUtf8(executorDataStr))
  .setContainer(ContainerInfo.newBuilder()
  .setType(ContainerInfo.Type.DOCKER)
  .setDocker(ContainerInfo.DockerInfo.newBuilder()
  .setImage(mesos-storm”)))
  
.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
  //rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container

Re: docker based executor

2015-04-17 Thread Tyson Norris
, 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4, 
testexecutor:latest)
time=2015-04-18T04:26:31Z level=info msg=-job log(start, 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4, 
testexecutor:latest) = OK (0)
time=2015-04-18T04:26:31Z level=debug msg=Calling GET 
/containers/{name:.*}/json
time=2015-04-18T04:26:31Z level=info msg=GET 
/containers/4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4/json
time=2015-04-18T04:26:31Z level=info msg=+job 
container_inspect(4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4)
time=2015-04-18T04:26:32Z level=info msg=-job 
start(4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4) = OK 
(0)
time=2015-04-18T04:26:32Z level=info msg=-job 
container_inspect(4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4)
 = OK (0)
time=2015-04-18T04:26:32Z level=debug msg=Calling GET 
/containers/{name:.*}/json
time=2015-04-18T04:26:32Z level=info msg=GET 
/v1.18/containers/mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a/json
time=2015-04-18T04:26:32Z level=info msg=+job 
container_inspect(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a)
time=2015-04-18T04:26:32Z level=info msg=-job 
container_inspect(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a) = OK (0)
time=2015-04-18T04:26:32Z level=debug msg=Calling GET 
/containers/{name:.*}/json
time=2015-04-18T04:26:32Z level=info msg=GET 
/v1.18/containers/mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a/json
time=2015-04-18T04:26:32Z level=info msg=+job 
container_inspect(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a)
time=2015-04-18T04:26:32Z level=info msg=-job 
container_inspect(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a) = OK (0)
time=2015-04-18T04:26:32Z level=debug msg=Calling POST 
/containers/{name:.*}/wait
time=2015-04-18T04:26:32Z level=info msg=POST 
/v1.18/containers/mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a/wait
time=2015-04-18T04:26:32Z level=info msg=+job 
wait(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a)
time=2015-04-18T04:26:32Z level=debug msg=Calling GET 
/containers/{name:.*}/logs
time=2015-04-18T04:26:32Z level=info msg=GET 
/v1.18/containers/mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a/logs?follow=1stderr=1stdout=1tail=all
time=2015-04-18T04:26:32Z level=info msg=+job 
container_inspect(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a)
time=2015-04-18T04:26:32Z level=info msg=-job 
container_inspect(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a) = OK (0)
time=2015-04-18T04:26:32Z level=info msg=+job 
logs(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a)
time=2015-04-18T04:26:32Z level=debug msg=Calling POST 
/containers/{name:.*}/stop
time=2015-04-18T04:26:32Z level=info msg=POST 
/v1.18/containers/mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a/stop?t=0
time=2015-04-18T04:26:32Z level=info msg=+job 
stop(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a)
time=2015-04-18T04:26:32Z level=debug msg=Sending 15 to 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4
time=2015-04-18T04:26:32Z level=info msg=Container 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4 failed to exit 
within 0 seconds of SIGTERM - using the force
time=2015-04-18T04:26:32Z level=debug msg=Sending 9 to 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4
time=2015-04-18T04:26:32Z level=info msg=+job log(die, 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4, 
testexecutor:latest)
time=2015-04-18T04:26:32Z level=info msg=-job log(die, 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4, 
testexecutor:latest) = OK (0)
time=2015-04-18T04:26:32Z level=info msg=-job 
logs(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a) = OK (0)
time=2015-04-18T04:26:32Z level=info msg=-job 
wait(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a) = OK (0)
time=2015-04-18T04:26:32Z level=info msg=+job log(stop, 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4, 
testexecutor:latest)
time=2015-04-18T04:26:32Z level=info msg=-job log(stop, 
4e8320cb2a8e4ede5fb5ae386866addfe008c0035397fe44b84f401e959f96f4, 
testexecutor:latest) = OK (0)
time=2015-04-18T04:26:32Z level=info msg=-job 
stop(mesos-3cc411b0-c2e0-41ae-80c2-f0306371da5a) = OK (0)”


I don’t see a syslog for the master/slave containers

Thanks
Tyson




On Apr 17, 2015, at 7:07 PM, Jason Giedymin 
jason.giedy...@gmail.commailto:jason.giedy...@gmail.com wrote:

What do any/all logs say? (syslog)

-Jason

On Apr 17, 2015, at 7:22 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:

another interesting fact:
I can restart the docker container of my executor, and it runs great.

In the test example below, notice the stdout appears to be growing as expected 
after restarting the container.

So something is killing my executor container (also indicated by the Exited 
(137) About a minute ago”), but I’m still not sure what.

Thanks
Tyson



tnorris-osx:insights tnorris$ docker ps -a | grep testexec
5291fe29c9c2testexecutor:latest 
  /bin/sh -c executor   About

Re: docker based executor

2015-04-17 Thread Tyson Norris
Hi Erik -
Yes these sound like good changes - I am currently focused on just trying to 
strip things down to be simpler for building versions etc.

Specifically I’ve been working on:
- don’t distribute config via embedded http server, just send the settings via 
command args, e.g. -c mesos.master.url=zk://zk1.service.consul:2181/mesos -c 
storm.zookeeper.servers=[\zk1.service.consul\”]
- use docker to ease framework+executor distribution (instead of repacking a 
storm tarball?) - single container that has storm installation + overlayed lib 
dir with meson-storm.jar, run it just like storm script: docker run mesos-storm 
supervisor storm.mesos.MesosSupervisor  (use the same container for supervisor 
executor + nimbus framework container)

Currently I stuck on this problem of the executor container dying without any 
indication why. I only know that it runs whatever container I specify for the 
executor approx half a second, and then it dies. Tried different containers, 
and different variants of shell true/false, etc. I haven’t been able to find 
any examples of running a container as executor, so while it seems like it 
would make things simpler, its not that way yet.

I will be happy to participate in refactoring, feel free to email me offlist.

Thanks
Tyson


On Apr 17, 2015, at 9:18 PM, Erik Weathers 
eweath...@groupon.commailto:eweath...@groupon.com wrote:

hey Tyson,

I've also worked a bit on improving  simplifying the mesos-storm framework -- 
spent the recent Mesosphere hackathon working with tnachen of Mesosphere on 
this.  Nothing deliverable quite yet.

We didn't look at dockerization at all, the hacking we did was around these 
goals:
* Avoiding the greedy hoarding of Offers done by the mesos-storm framework 
(ditching RotatingMap, and only hoarding Offers when there are topologies that 
need storm worker slots).
* Allowing the Mesos UI to distinguish the topologies, by having the Mesos 
tasks be dedicated to a topology.
* Adding usable logging in MesosNimbus. (Some of this work should be usable by 
other Mesos frameworks, since I'm pretty-printing the Mesos protobuf objects in 
1-line JSON instead of bazillion line protobuf toString() pseudo-JSON output.  
Would be nice to create a library out of it.)

Would you like to participate in an offline thread mesos-storm refactoring?

Thanks!

- Erik

On Fri, Apr 17, 2015 at 12:23 PM, Tyson Norris 
tnor...@adobe.commailto:tnor...@adobe.com wrote:
Hi -
I am looking at revving the mesos-storm framework to be dockerized (and 
simpler).
I’m using mesos 0.22.0-1.0.ubuntu1404
mesos master + mesos slave are deployed in docker containers, in case it 
matters.

I have the storm (nimbus) framework launching fine as a docker container, but 
launching tasks for a topology is having problems related to using a 
docker-based executor.

For example.

TaskInfo task = TaskInfo.newBuilder()
.setName(worker  + slot.getNodeId() + : + slot.getPort())
.setTaskId(taskId)
.setSlaveId(offer.getSlaveId())
.setExecutor(ExecutorInfo.newBuilder()

.setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
.setData(ByteString.copyFromUtf8(executorDataStr))
.setContainer(ContainerInfo.newBuilder()
.setType(ContainerInfo.Type.DOCKER)
.setDocker(ContainerInfo.DockerInfo.newBuilder()
.setImage(mesos-storm”)))

.setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor 
storm.mesos.MesosSupervisor))
//rest is unchanged from existing mesos-storm framework code

The executor launches and exits quickly - see the log msg:  Executor for 
container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited

It seems like mesos loses track of the executor? I understand there is a 1 min 
timeout on registering the executor, but the exit happens well before 1 minute.

I tried a few alternate commands to experiment, and I can see in the stdout for 
the task that
echo testing123  echo testing456”
prints to stdout correctly, both testing123 and testing456

however:
echo testing123a  sleep 10  echo testing456a”
prints only testing123a, presumably because the container is lost and destroyed 
before the sleep time is up.

So it’s like the container for the executor is only allowed to run for .5 
seconds, then it is detected as exited, and the task is lost.

Thanks for any advice.

Tyson



slave logs look like:
mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
mesos-slave1.service.consul-31000 for framework 
20150417-190611-2801799596-5050-1-
mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor 
insights-1-1429297638 of framework 20150417-190611