I've been trying out the docker-integration with mesos & marathon since the
bridged networking has been added and I've run into a couple of issues -
the most disturbing seems to be allocating of already in use ports (I
suspect this may be a marathon issue) and the failure to recover the tasks
once this occurs.

What I am running is a very simple setup, driven locally from vagrant. I
attempt to run the python3 container specified here under Bridged
Networking (https://mesosphere.github.io/marathon/docs/native-docker.html).

What I see is that, whilst the container is being pulled for the first time
every task exists as KILLED. Once the image has been pulled, the container
starts but mesos does not realise this - causing it to fail to start
additional containers with port allocation conflicts. Killing the
unrecognised container in docker will unblock mesos to start up the
containers.

Now, once this is started, if I attempt to scale the number of instances up
in marathon, I see in the UI that it attempts to start another container (a
third in my case, two slaves) with the same port allocations that are
already in use on the slave.

This is the error in the slave logs:

E1005 10:41:01.812988  2883 slave.cpp:2485] Container
'05cf52f1-b915-45e5-9071-6b46fda3b71c' for executor
'bridged-webapp.18747ba3-4c7c-11e4-9567-080027100ea3' of framework
'20141005-083953-159390892-5050-9177-0000' failed to start: Failed to
'docker run -d -c 512 -m 67108864 -e PORT=31000 -e PORT0=31000 -e
PORTS=31000,31001 -e PORT1=31001 -e MESOS_SANDBOX=/mnt/mesos/sandbox -v
/tmp/mesos/slaves/20141005-101854-159390892-5050-1326-0/frameworks/20141005-083953-159390892-5050-9177-0000/executors/bridged-webapp.18747ba3-4c7c-11e4-9567-080027100ea3/runs/05cf52f1-b915-45e5-9071-6b46fda3b71c:/mnt/mesos/sandbox
--net bridge -p 31000:8080/tcp -p 31001:161/udp --entrypoint /bin/sh --name
mesos-05cf52f1-b915-45e5-9071-6b46fda3b71c python:3 -c python3 -m
http.server 8080': exit status = exited with status 1 stderr = WARNING:
Your kernel does not support swap limit capabilities. Limitation discarded.
2014/10/05 10:41:01 Error response from daemon: Cannot start container
b2516e3356ca1cf3163f6926249b4e936ec9afe4549ee37f4a9d5df62dbbaf1b: Bind for
0.0.0.0:31000 failed: port is already allocated

There is nothing in the stderr or stdout of the task.

I have setup the slaves according to the docs (set the containerizers and
the timeout) - any help here would be appreciated.

Cheers,

Ryan

Reply via email to