Have you tried a more recent version of Mesos/Marathon? Docker support has
landed as a first class containerizer in the Mesos slave, so there is no
need to use deimos.

Niklas

On 25 November 2014 at 07:43, Geoffroy Jabouley <[email protected]
> wrote:

> Hello
>
> i am currently trying to activate checkpointing for my Mesos cloud.
>
> Starting from an application running in a docker container on the cluster,
> launched from marathon, my use cases are the followings:
>
> *UC1: kill the marathon service, then restart after 2 minutes.*
> *Expected*: the mesos task is still active, the docker container is
> running. When the marathon service restarts, it get backs its tasks.
>
> *Result*: OK
>
>
> *UC2: kill the mesos slave, then restart after 2 minutes.*
> *Expected*: the mesos task remains active, the docker container is
> running. When the mesos slave service restarts, it get backs its tasks.
> Marathon does not show error.
>
> *Results*: task get status LOST when slave is killed. Docker container
> still running.  Marathon detects the application went down and spawn a new
> one on another available mesos slave. When the slave restarts, it kills the
> previous running container and start a new one. So i end up with 2
> applications on my cluster, one spawn by Marathon, and another orphan one.
>
>
> Is this behavior normal? Can you please explain what i am doing wrong?
>
>
> -----------------------------------------------------------------------------------------------------------
>
> Here is the configuration i have come so far:
> Mesos 0.19.1 (not dockerized)
> Marathon 0.6.1 (not dockerized)
> Docker 1.3 + Deimos 0.4.2
>
> Mesos master is started:
> */usr/local/sbin/mesos-master --zk=zk://...:2181/mesos --port=5050
> --log_dir=/var/log/mesos --cluster=CLUSTER_POC --hostname=... --ip=...
> --quorum=1 --work_dir=/var/lib/mesos*
>
> Mesos slave is started:
> */usr/local/sbin/mesos-slave --master=zk://...:2181/mesos
> --log_dir=/var/log/mesos --checkpoint=true
> --containerizer_path=/usr/local/bin/deimos
> --executor_registration_timeout=5mins --hostname=... --ip=...
> --isolation=external --recover=reconnect --recovery_timeout=120mins
> --strict=true*
>
> Marathon is started:
> *java -Xmx512m -Djava.library.path=/usr/local/lib
> -Djava.util.logging.SimpleFormatter.format=%2$s %5$s%6$s%n -cp
> /usr/local/bin/marathon mesosphere.marathon.Main --zk
> zk://...:2181/marathon --master zk://...:2181/mesos --local_port_min 30000
> --hostname ... --event_subscriber http_callback --http_port 8080
> --task_launch_timeout 300000 --local_port_max 40000 --ha --checkpoint*
>
>
>
>
>

Reply via email to