Re: leader election issues

Mohit Jaggi Tue, 26 Sep 2017 14:03:51 -0700

Hmm..it seems machine62 became a leader but could not "register" as leader.
Not sure what that means. My naive assumption is that "becoming leader" and
"registering as leader" are "atomic".


------- grep on SchedulerLifecycle -----
aurora-scheduler.log:Sep 26 18:11:33 machine62 aurora-scheduler[24743]:
I0926 18:11:33.158 [LeaderSelector-0, StateMachine$Builder:389]
SchedulerLifecycle state machine transition STORAGE_PREPARED ->
LEADER_AWAITING_REGISTRATION
aurora-scheduler.log:Sep 26 18:11:33 machine62 aurora-scheduler[24743]:
I0926 18:11:33.159 [LeaderSelector-0, SchedulerLifecycle$4:224] Elected as
leading scheduler!
aurora-scheduler.log:Sep 26 18:11:37 machine62 aurora-scheduler[24743]:
I0926 18:11:37.204 [LeaderSelector-0,
SchedulerLifecycle$DefaultDelayedActions:163] Giving up on registration in
(10, mins)
aurora-scheduler.log:Sep 26 18:21:37 machine62 aurora-scheduler[24743]:
E0926 18:21:37.205 [Lifecycle-0, SchedulerLifecycle$4:235] Framework has
not been registered within the tolerated delay.
aurora-scheduler.log:Sep 26 18:21:37 machine62 aurora-scheduler[24743]:
I0926 18:21:37.205 [Lifecycle-0, StateMachine$Builder:389]
SchedulerLifecycle state machine transition LEADER_AWAITING_REGISTRATION ->
DEAD
aurora-scheduler.log:Sep 26 18:21:37 machine62 aurora-scheduler[24743]:
I0926 18:21:37.215 [Lifecycle-0, StateMachine$Builder:389]
SchedulerLifecycle state machine transition DEAD -> DEAD
aurora-scheduler.log:Sep 26 18:21:37 machine62 aurora-scheduler[24743]:
I0926 18:21:37.215 [Lifecycle-0, SchedulerLifecycle$6:275] Shutdown already
invoked, ignoring extra call.
aurora-scheduler.log:Sep 26 18:22:05 machine62 aurora-scheduler[54502]:
I0926 18:22:05.681 [main, StateMachine$Builder:389] SchedulerLifecycle
state machine transition IDLE -> PREPARING_STORAGE
aurora-scheduler.log:Sep 26 18:22:06 machine62 aurora-scheduler[54502]:
I0926 18:22:06.396 [main, StateMachine$Builder:389] SchedulerLifecycle
state machine transition PREPARING_STORAGE -> STORAGE_PREPARED


------ connecting to mesos -----
Sep 26 18:11:37 machine62 aurora-scheduler[24743]: I0926 18:11:37.211750
24871 group.cpp:757] Found non-sequence node 'log_replicas' at '/mesos' in
ZooKeeper
Sep 26 18:11:37 machine62 aurora-scheduler[24743]: I0926 18:11:37.211817
24871 detector.cpp:152] Detected a new leader: (id='1506')
Sep 26 18:11:37 machine62 aurora-scheduler[24743]: I0926 18:11:37.211917
24871 group.cpp:699] Trying to get '/mesos/json.info_0000001506' in
ZooKeeper
Sep 26 18:11:37 machine62 aurora-scheduler[24743]: I0926 18:11:37.216063
24871 zookeeper.cpp:262] A new leading master ([email protected]:5050)
is detected
Sep 26 18:11:37 machine62 aurora-scheduler[24743]: I0926 18:11:37.216162
24871 scheduler.cpp:470] New master detected at [email protected]:5050
Sep 26 18:11:37 machine62 aurora-scheduler[24743]: I0926 18:11:37.217772
24871 scheduler.cpp:479] Waiting for 12.81503ms before initiating a
re-(connection) attempt with the master
Sep 26 18:11:37 machine62 aurora-scheduler[24743]: I0926 18:11:37.231549
24868 scheduler.cpp:361] Connected with the master at
http://10.163.25.45:5050/master/api/v1/scheduler



On Tue, Sep 26, 2017 at 1:24 PM, Bill Farner <[email protected]> wrote:

> Is there a reason a non-leading scheduler will talk to Mesos
>
>
> No, there is not a legitimate reason.  Did this occur for an extended
> period of time?  Do you have logs from the scheduler indicating that it
> lost ZK leadership and subsequently interacted with mesos?
>
> On Tue, Sep 26, 2017 at 1:02 PM, Mohit Jaggi <[email protected]> wrote:
>
>> Fellows,
>> While examining Aurora log files, I noticed a condition where a scheduler
>> was talking to Mesos but it was not showing up as a leader in Zookeeper. It
>> ultimately restarted itself and another scheduler became the leader.
>> Is there a reason a non-leading scheduler will talk to Mesos?
>>
>> Mohit.
>>
>
>

Re: leader election issues

Reply via email to