Is it for all 3 processes: master, slave, and marathon? Thanks Nikolay
From: Dario Rexin [mailto:[email protected]] Sent: Tuesday, April 28, 2015 9:47 AM To: [email protected] Subject: Re: Marathon chage of leader and stalled deployments On each host you have to set it to the interface that is connected to the network your cluster is running in. On 28.04.2015, at 16:41, Nikolay Borodachev <[email protected]<mailto:[email protected]>> wrote: Hi Dario, This could be the reason but why would it not bind to all network interfaces by default? To test it out, should I set LIBPROCESS_IP to an IP address of mesos1 server? Thank you Nikolay From: Dario Rexin [mailto:[email protected]] Sent: Tuesday, April 28, 2015 4:31 AM To: [email protected]<mailto:[email protected]> Subject: Re: Marathon chage of leader and stalled deployments Hi Nikolay, could this be the problem? Apr 27 22:36:00 mesos1 marathon[6289]: ************************************************** Apr 27 22:36:00 mesos1 marathon[6289]: Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. Apr 27 22:36:00 mesos1 marathon[6289]: ************************************************** This would explain why only a certain node (most likely the one that’s running on the same machine as the current Mesos leader) can start tasks. Cheers, Dario On 27 Apr 2015, at 23:49, Nikolay Borodachev <[email protected]<mailto:[email protected]>> wrote: Dario, The logs are quote lengthy, so I sent them to you directly. Marathon version is 0.8.1. Thank you Nikolay From: Dario Rexin [mailto:[email protected]] Sent: Monday, April 27, 2015 4:01 PM To: [email protected]<mailto:[email protected]> Subject: Re: Marathon chage of leader and stalled deployments Hi Nikolay, this is an unexpected behavior. Could you please post the log output from the leading node around the time you try to scale? Also, what version of Marathon are you running? Thanks, Dario On 27.04.2015, at 20:41, Nikolay Borodachev <[email protected]<mailto:[email protected]>> wrote: Hello All, I noticed a strange behavior of a Marathon cluster. The cluster consist of 3 mesos/marathon masters and 3 slaves. Once the cluster is freshly started I can start a process (e.g. httpd) and scale it up and down without any problems. Everything works as it should. However, if a Marathon leader goes down or gets restarted, the managed processes cannot be scaled anymore. The scaling request gets queued but does not get executed by a new Marathon leader. I found that if I recycle the current leader until the original server becomes a leader again, the scaling request would not move. It is only when the server that used to be a leader when the tasks were created becomes a leader again then these tasks can be scaled. Is this a known and expected behavior? Thanks Nikolay

