Check the mesos-slave log on one of the slaves, in /var/log/mesos/mesos-slave.INFO. There's probably some information there about the docker pull, or other things that could have errored before the actual container is launched. Alternatively, you could try a `docker pull` manually on one of the slaves, then see if the launch succeeds on that node. Then you'll know if it was a timeout during the docker pull, at which point you can either further increase the registration timeout or decide to pre-pull all your images (as a periodic Chronos task?), due to unpredictable network latencies in AWS.
On Mon, Feb 23, 2015 at 4:31 PM, max square <[email protected]> wrote: > Hi all, > > I am using the cloudformation scripts > <https://github.com/mbabineau/cloudformation-mesos>to create a Mesos > cluster, with Marathon 0.7.5, and Chronos 2.3.2. The setup is working > perfectly, for regula processes. However now I am trying to deploy a > simple docker image, but it is failing without producing any errors in the > sandbox. > > I followed the following tutorial > <https://mesosphere.com/docs/tutorials/launch-docker-container-on-mesosphere/> > to > set the Mesos Executor Timeout to 5mins and I can see the following > processes running on all the slave machines. Where the containerizers are > in the correct order: > > root 1615 0.0 0.0 168 4 ? Ss Feb20 0:00 runsv > mesos-slave > > root 1616 0.0 0.0 184 4 ? S Feb20 0:00 svlogd > -tt /var/log/mesos-slave > > root 1617 3.1 0.2 874688 17376 ? Sl Feb20 133:56 > /usr/local/sbin/mesos-slave --log_dir=/var/log/mesos > --containerizers=docker,mesos > > root 3290 0.0 0.0 4444 652 ? Ss Feb21 0:00 sh -c > /usr/local/libexec/mesos/mesos-executor > > root 3304 0.0 0.1 720528 10332 ? Sl Feb21 1:01 > /usr/local/libexec/mesos mesos-executor > > Does anyone have suggestions on how to debug the issue? > > Thanks in Advance > > Sergio Daniel >

