RE: Marathon chage of leader and stalled deployments

Nikolay Borodachev Tue, 28 Apr 2015 11:08:22 -0700

That did the trick! Thank you very much, Dario!

From: Dario Rexin [mailto:[email protected]]
Sent: Tuesday, April 28, 2015 10:00 AM
To: [email protected]
Subject: Re: Marathon chage of leader and stalled deployments


Yes. Unfortunately that’s the only way to set the IP in the Mesos Java bindings.

On 28 Apr 2015, at 16:57, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:

I actually have ‘—ip’ parameter set for both master and slave. So, 
LIBPROCESS_IP should only be set for marathon?

From: Dario Rexin [mailto:[email protected]]
Sent: Tuesday, April 28, 2015 9:56 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Marathon chage of leader and stalled deployments

On master and slave you should be able to start it with the —ip parameter, 
instead of using the env variable. But you should set the IP to a fixed value 
for all processes.

On 28 Apr 2015, at 16:52, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:

Is it for all 3 processes: master, slave, and marathon?

Thanks
Nikolay

From: Dario Rexin [mailto:[email protected]]
Sent: Tuesday, April 28, 2015 9:47 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Marathon chage of leader and stalled deployments

On each host you have to set it to the interface that is connected to the 
network your cluster is running in.


On 28.04.2015, at 16:41, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:
Hi Dario,

This could be the reason but why would it not bind to all network interfaces by 
default?
To test it out, should I set LIBPROCESS_IP to an IP address of mesos1 server?

Thank you
Nikolay

From: Dario Rexin [mailto:[email protected]]
Sent: Tuesday, April 28, 2015 4:31 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Marathon chage of leader and stalled deployments

Hi Nikolay,

could this be the problem?

Apr 27 22:36:00 mesos1 marathon[6289]: 
**************************************************
Apr 27 22:36:00 mesos1 marathon[6289]: Scheduler driver bound to loopback 
interface! Cannot communicate with remote master(s). You might want to set 
'LIBPROCESS_IP' environment variable to use a routable IP address.
Apr 27 22:36:00 mesos1 marathon[6289]: 
**************************************************

This would explain why only a certain node (most likely the one that’s running 
on the same machine as the current Mesos leader) can start tasks.

Cheers,
Dario

On 27 Apr 2015, at 23:49, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:

Dario,

The logs are quote lengthy, so I sent them to you directly. Marathon version is 
0.8.1.

Thank you
Nikolay

From: Dario Rexin [mailto:[email protected]]
Sent: Monday, April 27, 2015 4:01 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Marathon chage of leader and stalled deployments

Hi Nikolay,

this is an unexpected behavior. Could you please post the log output from the 
leading node around the time you try to scale? Also, what version of Marathon 
are you running?

Thanks,
Dario


On 27.04.2015, at 20:41, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:
Hello All,

I noticed a strange behavior of a Marathon cluster. The cluster consist of 3 
mesos/marathon masters and 3 slaves.

Once the cluster is freshly started I can start a process (e.g. httpd) and 
scale it up and down without any problems. Everything works as it should.
However, if a Marathon leader goes down or gets restarted, the managed 
processes cannot be scaled anymore. The scaling request gets queued but does 
not get executed by a new Marathon leader.
I found that if I recycle the current leader until the original server becomes 
a leader again, the  scaling request would not move.
It is only when the server that used to be a leader when the tasks were created 
becomes a leader again then these tasks can be scaled.

Is this a known and expected behavior?

Thanks
Nikolay

RE: Marathon chage of leader and stalled deployments

Reply via email to