Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
On Mon, Apr 21, 2014 at 3:10 PM, Sharma Podila spod...@netflix.com wrote: On a related note, what if framework scheduler is up while Mesos master goes down. Then, if Mesos master restarts after a time interval greater than framework failover timeout, what is the expected behavior? Would the framework successfully get a re-registered() callback? Or error() callback? Other? If there is only one master (not recommended for HA) and it starts after the framework failover timeout then yes the framework can successfully re-register. The framework failover timer is operated by the master. This is somewhat similar to how ZooKeeper handles session expiration timeouts.
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
David, did you see Vinod's response to your (identical) question on dev@? http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comwrote: I don't recall the exact timeout of framework IDs, but what I'm wondering is what happens if a scheduler tries to failover, but the failover grace period has elapsed? Does it fail to register, or does it successfully register and all the old executors are just gone?
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
I did not, thank you! I reported when I didn't see a response and couldn't find it in the dev archive--I thought maybe it had gotten blackholed because I don't subscribe to dev. On Thursday, April 17, 2014, Adam Bordelon a...@mesosphere.io wrote: David, did you see Vinod's response to your (identical) question on dev@? http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comjavascript:_e(%7B%7D,'cvml','dsg123456...@gmail.com'); wrote: I don't recall the exact timeout of framework IDs, but what I'm wondering is what happens if a scheduler tries to failover, but the failover grace period has elapsed? Does it fail to register, or does it successfully register and all the old executors are just gone?
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
My follow-up question is this--is there a way to tell whether I'm outside of the timeout window? I'd like to have my framework check ZK and determine whether it's w/in the framework timeout or not, so that it can make the correct call. On Thu, Apr 17, 2014 at 5:23 PM, Adam Bordelon a...@mesosphere.io wrote: David, did you see Vinod's response to your (identical) question on dev@? http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comwrote: I don't recall the exact timeout of framework IDs, but what I'm wondering is what happens if a scheduler tries to failover, but the failover grace period has elapsed? Does it fail to register, or does it successfully register and all the old executors are just gone?
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg dsg123456...@gmail.comwrote: My follow-up question is this--is there a way to tell whether I'm outside of the timeout window? I'd like to have my framework check ZK and determine whether it's w/in the framework timeout or not, so that it can make the correct call. Hey David, Currently, the only signal you can get is by hitting /state.json endpoint on the master. The framework should've been moved to 'completed_frameworks' after the failover timeout. Of course, if a master fails over this information is lost so you can't reliably depend on it. When master starts storing persistent state about frameworks (likely couple of releases away), a re-registration attempt in such a case would be denied by the master. So that could be your signal. Alternatively, with persistence, you could also more reliably depend on /state.json to get this info. To take a step back, what is the problem you are trying to solve? Thanks,