Hi Gordon,

Yes, there is a TestRaftServerNoLeaderTimeout for
testing notifyExtendedNoLeader(..).

The bold line is for *notifyLeaderChanged(..)*, which is a different
method.  The notifyExtendedNoLeader(..) method should work (i.e. it will be
called after NO_LEADER_TIMEOUT_KEY).  Have you overridden it in your
StateMachine?  Otherwise, it is a no-op.

Hope it helps.
Tsz-Wo



On Fri, Jun 11, 2021 at 11:39 PM Gordon Jahn <[email protected]> wrote:

> Hi folks,
>
> Further to this, I'm wondering if I never see the notifyExtendedNoLeader
> called as I see these exceptions:
>
> [grpc-default-executor-5] WARN
> org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed
> requestVote n2->n1#0:
> org.apache.ratis.protocol.exceptions.ServerNotReadyException:
> n0@group-726F75706964 is not in [RUNNING]: current state is STARTING
> [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division
> - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2,
> group-726F75706964, 23, (t:23, i:566))
> [grpc-default-executor-5] WARN
> org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed
> requestVote n2->n1#0:
> org.apache.ratis.protocol.exceptions.ServerNotReadyException:
> n0@group-726F75706964 is not in [RUNNING]: current state is STARTING
> [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division
> - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2,
> group-726F75706964, 23, (t:23, i:566))
> [grpc-default-executor-5] WARN
> org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed
> requestVote n2->n1#0:
> org.apache.ratis.protocol.exceptions.ServerNotReadyException:
> n0@group-726F75706964 is not in [RUNNING]: current state is STARTING
> [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division
> - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2,
> group-726F75706964, 23, (t:23, i:566))
> [grpc-default-executor-5] WARN
> org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed
> requestVote n2->n1#0:
> org.apache.ratis.protocol.exceptions.ServerNotReadyException:
> n0@group-726F75706964 is not in [RUNNING]: current state is STARTING
> [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division
> - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2,
> group-726F75706964, 23, (t:23, i:566))
>
> Is this a known issue?  Is there a basic test case somewhere showing
> notifyExtendedNoLeader working with gRPC and what needs to be configured
> (is it more than 
> properties.setTimeDuration(RaftServerConfigKeys.Notification.NO_LEADER_TIMEOUT_KEY,
> TimeDuration.valueOf(5000, TimeUnit.MILLISECONDS)); ?).  Equally, would a
> different transport work out of the box?
>
> Alternatively, is there a reason that ServerState.java looks like this:
>
>   void setLeader(RaftPeerId newLeaderId, Object op) {
>     if (!Objects.equals(leaderId, newLeaderId)) {
>       String suffix;
>       if (newLeaderId == null) {
>         // reset the time stamp when a null leader is assigned
>         lastNoLeaderTime = Timestamp.currentTime();
>         suffix = "";
>       } else {
>         Timestamp previous = lastNoLeaderTime;
>         lastNoLeaderTime = null;
>         suffix = ", leader elected after " + previous.elapsedTimeMs() +
> "ms";
> *
> server.getStateMachine().event().notifyLeaderChanged(getMemberId(),
> newLeaderId);*
>       }
>       LOG.info("{}: change Leader from {} to {} at term {} for {}{}",
>           getMemberId(), leaderId, newLeaderId, getCurrentTerm(), op,
> suffix);
>       leaderId = newLeaderId;
>       if (leaderId != null) {
>         server.finishTransferLeadership();
>       }
>     }
>   }
>
> It seems like moving bold line down, outside the else block would mean the
> state machine is notified when the leader moves to *null* which would give
> the notification that no leader was present and that data in the store
> should not be trusted (in this case, my application would resign primary
> status).
>
> Any input on this approach would be useful...  it's painful to be so close
> to a working solution but just unable to actually be told the server state
> has changed (or even access it as all access to the server state seems to
> sit behind non-public methods).
>
> Regards,
> Gordon
>
> On Fri, Jun 11, 2021 at 1:00 PM Gordon Jahn <[email protected]> wrote:
>
>> Hi folks,
>>
>> I'm not sure if I'm missing something incredibly simple or just going
>> about this the wrong way...
>>
>> I've implemented a simple Ratis State Machine (extending BaseStateMachine
>> as per the examples) in order to write a primary-server selection.
>>
>> My design is:
>>
>> * Use Ratis as a high availability data store; I don't care which node is
>> actually the leader of the Ratis group - they can fight it out, and all my
>> state machines get messages - that's great
>> * For my application servers, maybe only 2 of 3 can ever be the primary
>> server and they share the availability and priority via Ratis messages
>> * The Ratis state machine picks algorithmically the machine to be my
>> application primary (based on the enabled, priority and ID fields shared)
>> * The state machine notifies the rest of my application when it is / is
>> not the primary server
>> * The machine should not be able to be primary server if it's not
>> connected to the Ratis group
>>
>> I have most of this working, but cannot figure out where to get a
>> notification / event from Ratis that the peer is not part of the majority
>> group.
>>
>> Implementing notifyFollowerSlowness(RoleInfoProto roleInfoProto) and
>> notifyExtendedNoLeader(RoleInfoProto roleInfoProto), and setting timeouts
>> in the config, doesn't seem to result in these being called if I start 3
>> Ratis nodes then shut 2 down. The state machine's reset method also isn't
>> called (I thought it might if it was no longer part of a quorum, and then
>> reinitialise might be called when it rejoined).
>>
>> Should I be able to see the disconnection of my state machine somewhere
>> so I can trigger an event or is this just the wrong approach to take?  Is
>> there a leader election example anywhere to demonstrate this?
>>
>> Thanks in advance,
>> Gordon
>>
>

Reply via email to