Sounds like a marathon issue. You should ask in marathon mailing list. 

@vinodkone

> On Aug 28, 2015, at 7:05 AM, Rogier Dikkes <[email protected]> wrote:
> 
> Hello all,
> 
> I am running test cluster with Mesos and Marathon in a cluster of 20 compute 
> nodes and 2 head nodes running vm's that host all masters, frameworks etc. 
> Till the 0.23 update there were not many issues but today i seen an issue 
> that i must share and hope you guys know more about.
> 
> We run an updated Mesos version 0.23 and Marathon 0.10.0.
> 
> I started a hdfs namenode on docker through marathon and a couple of data 
> nodes on the agents, im slowly building this config further with secondary 
> namenodes, datanodes, journal nodes all in containers. For now its a very 
> basic setup to see how stable everything is and what we should consider when 
> running in containers.
> 
> Today we found out that the marathon leader suddenly was registered 2 times 
> as framework with different id's and to make it worse: It spawned task again 
> that was already running. Suddenly we had 2 namenodes with the name 
> management. Our consul cluster auto registered both containers and started to 
> forward all traffic to these 2 namenodes.
> 
> I always thought that zookeeper was taking care of election for marathon and 
> this should prevent scenario's like this. However both frameworks had a 
> different ID, which should explain why zookeeper didn't handle the election.
> 
> The marathon web interface was no longer responding and everything timed out, 
> i found out that there was only a single marathon process was running. To get 
> hdfs back running again i killed the containers and killed the marathon 
> process. From logs i couldn't gather why this happens, the 10 minutes around 
> the registration of the framework there is nothing but offers, http calls and 
> task syncs.
> 
> The strange thing i just noticed is that marathon incidentally re-registers 
> itself while its process is not restarted or elected.
> 
> Does anyone have an idea where to look?
> 
> -- 
> Rogier Dikkes
> Systeem Programmeur Hadoop & HPC Cloud
> SURFsara | Science Park 140 | 1098 XG Amsterdam
> 

Reply via email to