Sounds like a marathon issue. You should ask in marathon mailing list. @vinodkone
> On Aug 28, 2015, at 7:05 AM, Rogier Dikkes <[email protected]> wrote: > > Hello all, > > I am running test cluster with Mesos and Marathon in a cluster of 20 compute > nodes and 2 head nodes running vm's that host all masters, frameworks etc. > Till the 0.23 update there were not many issues but today i seen an issue > that i must share and hope you guys know more about. > > We run an updated Mesos version 0.23 and Marathon 0.10.0. > > I started a hdfs namenode on docker through marathon and a couple of data > nodes on the agents, im slowly building this config further with secondary > namenodes, datanodes, journal nodes all in containers. For now its a very > basic setup to see how stable everything is and what we should consider when > running in containers. > > Today we found out that the marathon leader suddenly was registered 2 times > as framework with different id's and to make it worse: It spawned task again > that was already running. Suddenly we had 2 namenodes with the name > management. Our consul cluster auto registered both containers and started to > forward all traffic to these 2 namenodes. > > I always thought that zookeeper was taking care of election for marathon and > this should prevent scenario's like this. However both frameworks had a > different ID, which should explain why zookeeper didn't handle the election. > > The marathon web interface was no longer responding and everything timed out, > i found out that there was only a single marathon process was running. To get > hdfs back running again i killed the containers and killed the marathon > process. From logs i couldn't gather why this happens, the 10 minutes around > the registration of the framework there is nothing but offers, http calls and > task syncs. > > The strange thing i just noticed is that marathon incidentally re-registers > itself while its process is not restarted or elected. > > Does anyone have an idea where to look? > > -- > Rogier Dikkes > Systeem Programmeur Hadoop & HPC Cloud > SURFsara | Science Park 140 | 1098 XG Amsterdam >

