Marathon constantly unregisters on particular slaves

Mateusz Moneta Wed, 24 Aug 2016 07:05:48 -0700

Hello,

we have Mesos production cluster composed from 14 slaves nodes and 3
masters running Mesos 1.0.0 and Marathon 1.1.1. All nodes are managed by
Puppet so have identical configuration. OS is Debian Jessie with 4.6.0
Kernel.


We have problem that, after recent restart of `mesos-slaves` processes
across our cluster, Marathon dies on two of them constantly.

Scenario is:
* Marathon registers on slave,
* Marathon run for a couple of minutes, tasks are launched and everything
seems fine,
* Marathon unregisters.

I've checked Mesos Slave/Mesos Master/Marathon logs and found nothing.
Mesos master has only logs about REVIVE framework, slave about 'framework
seems to be missing' and Marathon only about rescheduling tasks from slave.

I've tried reboots, removing /var/lib/mesos/meta, restarting
slaves/masters/marathon with different configurations. Nothing helps.

Any clues what can be wrong or how to debug this?

-- 
BR,
Mateusz

Marathon constantly unregisters on particular slaves

Reply via email to