Hey Vinod,
The problem I'm trying to solve is writing a framework that can run on our
HA application cluster, and whenever the framework's current scheduler
dies, another node will be elected and take over. I'm trying to work
through the various failure cases to understand how implement this so that
it works through all the failure cases I can think of.

It sounds like the solution that'd work best for me would be to try to read
the framework ID from a known location and register with that. If it's not
there, or if registration fails, then the framework should register anew.

This framework's state is very large, and resides in a couple databases, so
that even if the entire set of candidates for becoming the framework is
down for the whole failover grave period, the framework still wants to
register, since it's state never gets invalidated.

Thanks,
David

On Thursday, April 17, 2014, Vinod Kone <[email protected]> wrote:

>
> On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg 
> <[email protected]<javascript:_e(%7B%7D,'cvml','[email protected]');>
> > wrote:
>
>> My follow-up question is this--is there a way to tell whether I'm outside
>> of the timeout window? I'd like to have my framework check ZK and determine
>> whether it's w/in the framework timeout or not, so that it can make the
>> correct call.
>>
>
> Hey David,
>
> Currently, the only signal you can get is by hitting "/state.json"
> endpoint on the master. The framework should've been moved to
> 'completed_frameworks' after the failover timeout. Of course, if a master
> fails over this information is lost so you can't reliably depend on it.
>
> When master starts storing persistent state about frameworks (likely
> couple of releases away), a re-registration attempt in such a case would be
> denied by the master. So that could be your signal. Alternatively, with
> persistence, you could also more reliably depend on "/state.json" to get
> this info.
>
> To take a step back, what is the problem you are trying to solve?
>
> Thanks,
>

Reply via email to