Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-21 Thread Vinod Kone
On Mon, Apr 21, 2014 at 3:10 PM, Sharma Podila spod...@netflix.com wrote:

 On a related note, what if framework scheduler is up while Mesos master
 goes down. Then, if Mesos master restarts after a time interval greater
 than framework failover timeout, what is the expected behavior? Would the
 framework successfully get a re-registered() callback? Or error() callback?
 Other?


If there is only one master (not recommended for HA) and it starts after
the framework failover timeout then yes the framework can successfully
re-register. The framework failover timer is operated by the master. This
is somewhat similar to how ZooKeeper handles session expiration timeouts.


Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread Adam Bordelon
David, did you see Vinod's response to your (identical) question on dev@?
http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html


On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comwrote:

 I don't recall the exact timeout of framework IDs, but what I'm wondering
 is what happens if a scheduler tries to failover, but the failover grace
 period has elapsed? Does it fail to register, or does it successfully
 register and all the old executors are just gone?




Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread David Greenberg
I did not, thank you! I reported when I didn't see a response and couldn't
find it in the dev archive--I thought maybe it had gotten blackholed
because I don't subscribe to dev.

On Thursday, April 17, 2014, Adam Bordelon a...@mesosphere.io wrote:

 David, did you see Vinod's response to your (identical) question on dev@?
 http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html


 On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg 
 dsg123456...@gmail.comjavascript:_e(%7B%7D,'cvml','dsg123456...@gmail.com');
  wrote:

 I don't recall the exact timeout of framework IDs, but what I'm wondering
 is what happens if a scheduler tries to failover, but the failover grace
 period has elapsed? Does it fail to register, or does it successfully
 register and all the old executors are just gone?





Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread David Greenberg
My follow-up question is this--is there a way to tell whether I'm outside
of the timeout window? I'd like to have my framework check ZK and determine
whether it's w/in the framework timeout or not, so that it can make the
correct call.


On Thu, Apr 17, 2014 at 5:23 PM, Adam Bordelon a...@mesosphere.io wrote:

 David, did you see Vinod's response to your (identical) question on dev@?
 http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html


 On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg 
 dsg123456...@gmail.comwrote:

 I don't recall the exact timeout of framework IDs, but what I'm wondering
 is what happens if a scheduler tries to failover, but the failover grace
 period has elapsed? Does it fail to register, or does it successfully
 register and all the old executors are just gone?





Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread Vinod Kone
On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg dsg123456...@gmail.comwrote:

 My follow-up question is this--is there a way to tell whether I'm outside
 of the timeout window? I'd like to have my framework check ZK and determine
 whether it's w/in the framework timeout or not, so that it can make the
 correct call.


Hey David,

Currently, the only signal you can get is by hitting /state.json endpoint
on the master. The framework should've been moved to 'completed_frameworks'
after the failover timeout. Of course, if a master fails over this
information is lost so you can't reliably depend on it.

When master starts storing persistent state about frameworks (likely couple
of releases away), a re-registration attempt in such a case would be denied
by the master. So that could be your signal. Alternatively, with
persistence, you could also more reliably depend on /state.json to get
this info.

To take a step back, what is the problem you are trying to solve?

Thanks,