RE: Marathon chage of leader and stalled deployments

Nikolay Borodachev Tue, 28 Apr 2015 07:53:25 -0700

Is it for all 3 processes: master, slave, and marathon?

Thanks
Nikolay

From: Dario Rexin [mailto:[email protected]]
Sent: Tuesday, April 28, 2015 9:47 AM
To: [email protected]
Subject: Re: Marathon chage of leader and stalled deployments

On each host you have to set it to the interface that is connected to the 
network your cluster is running in.

On 28.04.2015, at 16:41, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:
Hi Dario,

This could be the reason but why would it not bind to all network interfaces by 
default?
To test it out, should I set LIBPROCESS_IP to an IP address of mesos1 server?

Thank you
Nikolay

From: Dario Rexin [mailto:[email protected]]
Sent: Tuesday, April 28, 2015 4:31 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Marathon chage of leader and stalled deployments

Hi Nikolay,

could this be the problem?

Apr 27 22:36:00 mesos1 marathon[6289]: 
**************************************************
Apr 27 22:36:00 mesos1 marathon[6289]: Scheduler driver bound to loopback 
interface! Cannot communicate with remote master(s). You might want to set 
'LIBPROCESS_IP' environment variable to use a routable IP address.
Apr 27 22:36:00 mesos1 marathon[6289]: 
**************************************************

This would explain why only a certain node (most likely the one that’s running 
on the same machine as the current Mesos leader) can start tasks.

Cheers,
Dario

On 27 Apr 2015, at 23:49, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:

Dario,

The logs are quote lengthy, so I sent them to you directly. Marathon version is 
0.8.1.

Thank you
Nikolay

From: Dario Rexin [mailto:[email protected]]
Sent: Monday, April 27, 2015 4:01 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Marathon chage of leader and stalled deployments

Hi Nikolay,

this is an unexpected behavior. Could you please post the log output from the 
leading node around the time you try to scale? Also, what version of Marathon 
are you running?

Thanks,
Dario

On 27.04.2015, at 20:41, Nikolay Borodachev 
<[email protected]<mailto:[email protected]>> wrote:
Hello All,

I noticed a strange behavior of a Marathon cluster. The cluster consist of 3 
mesos/marathon masters and 3 slaves.

Once the cluster is freshly started I can start a process (e.g. httpd) and 
scale it up and down without any problems. Everything works as it should.
However, if a Marathon leader goes down or gets restarted, the managed 
processes cannot be scaled anymore. The scaling request gets queued but does 
not get executed by a new Marathon leader.
I found that if I recycle the current leader until the original server becomes 
a leader again, the  scaling request would not move.
It is only when the server that used to be a leader when the tasks were created 
becomes a leader again then these tasks can be scaled.

Is this a known and expected behavior?

Thanks
Nikolay

RE: Marathon chage of leader and stalled deployments

Reply via email to