Hi Gordon, Yes, there is a TestRaftServerNoLeaderTimeout for testing notifyExtendedNoLeader(..).
The bold line is for *notifyLeaderChanged(..)*, which is a different method. The notifyExtendedNoLeader(..) method should work (i.e. it will be called after NO_LEADER_TIMEOUT_KEY). Have you overridden it in your StateMachine? Otherwise, it is a no-op. Hope it helps. Tsz-Wo On Fri, Jun 11, 2021 at 11:39 PM Gordon Jahn <[email protected]> wrote: > Hi folks, > > Further to this, I'm wondering if I never see the notifyExtendedNoLeader > called as I see these exceptions: > > [grpc-default-executor-5] WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed > requestVote n2->n1#0: > org.apache.ratis.protocol.exceptions.ServerNotReadyException: > n0@group-726F75706964 is not in [RUNNING]: current state is STARTING > [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division > - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2, > group-726F75706964, 23, (t:23, i:566)) > [grpc-default-executor-5] WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed > requestVote n2->n1#0: > org.apache.ratis.protocol.exceptions.ServerNotReadyException: > n0@group-726F75706964 is not in [RUNNING]: current state is STARTING > [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division > - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2, > group-726F75706964, 23, (t:23, i:566)) > [grpc-default-executor-5] WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed > requestVote n2->n1#0: > org.apache.ratis.protocol.exceptions.ServerNotReadyException: > n0@group-726F75706964 is not in [RUNNING]: current state is STARTING > [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division > - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2, > group-726F75706964, 23, (t:23, i:566)) > [grpc-default-executor-5] WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService - n0: Failed > requestVote n2->n1#0: > org.apache.ratis.protocol.exceptions.ServerNotReadyException: > n0@group-726F75706964 is not in [RUNNING]: current state is STARTING > [grpc-default-executor-5] INFO org.apache.ratis.server.RaftServer$Division > - n0@group-726F75706964: receive requestVote(PRE_VOTE, n2, > group-726F75706964, 23, (t:23, i:566)) > > Is this a known issue? Is there a basic test case somewhere showing > notifyExtendedNoLeader working with gRPC and what needs to be configured > (is it more than > properties.setTimeDuration(RaftServerConfigKeys.Notification.NO_LEADER_TIMEOUT_KEY, > TimeDuration.valueOf(5000, TimeUnit.MILLISECONDS)); ?). Equally, would a > different transport work out of the box? > > Alternatively, is there a reason that ServerState.java looks like this: > > void setLeader(RaftPeerId newLeaderId, Object op) { > if (!Objects.equals(leaderId, newLeaderId)) { > String suffix; > if (newLeaderId == null) { > // reset the time stamp when a null leader is assigned > lastNoLeaderTime = Timestamp.currentTime(); > suffix = ""; > } else { > Timestamp previous = lastNoLeaderTime; > lastNoLeaderTime = null; > suffix = ", leader elected after " + previous.elapsedTimeMs() + > "ms"; > * > server.getStateMachine().event().notifyLeaderChanged(getMemberId(), > newLeaderId);* > } > LOG.info("{}: change Leader from {} to {} at term {} for {}{}", > getMemberId(), leaderId, newLeaderId, getCurrentTerm(), op, > suffix); > leaderId = newLeaderId; > if (leaderId != null) { > server.finishTransferLeadership(); > } > } > } > > It seems like moving bold line down, outside the else block would mean the > state machine is notified when the leader moves to *null* which would give > the notification that no leader was present and that data in the store > should not be trusted (in this case, my application would resign primary > status). > > Any input on this approach would be useful... it's painful to be so close > to a working solution but just unable to actually be told the server state > has changed (or even access it as all access to the server state seems to > sit behind non-public methods). > > Regards, > Gordon > > On Fri, Jun 11, 2021 at 1:00 PM Gordon Jahn <[email protected]> wrote: > >> Hi folks, >> >> I'm not sure if I'm missing something incredibly simple or just going >> about this the wrong way... >> >> I've implemented a simple Ratis State Machine (extending BaseStateMachine >> as per the examples) in order to write a primary-server selection. >> >> My design is: >> >> * Use Ratis as a high availability data store; I don't care which node is >> actually the leader of the Ratis group - they can fight it out, and all my >> state machines get messages - that's great >> * For my application servers, maybe only 2 of 3 can ever be the primary >> server and they share the availability and priority via Ratis messages >> * The Ratis state machine picks algorithmically the machine to be my >> application primary (based on the enabled, priority and ID fields shared) >> * The state machine notifies the rest of my application when it is / is >> not the primary server >> * The machine should not be able to be primary server if it's not >> connected to the Ratis group >> >> I have most of this working, but cannot figure out where to get a >> notification / event from Ratis that the peer is not part of the majority >> group. >> >> Implementing notifyFollowerSlowness(RoleInfoProto roleInfoProto) and >> notifyExtendedNoLeader(RoleInfoProto roleInfoProto), and setting timeouts >> in the config, doesn't seem to result in these being called if I start 3 >> Ratis nodes then shut 2 down. The state machine's reset method also isn't >> called (I thought it might if it was no longer part of a quorum, and then >> reinitialise might be called when it rejoined). >> >> Should I be able to see the disconnection of my state machine somewhere >> so I can trigger an event or is this just the wrong approach to take? Is >> there a leader election example anywhere to demonstrate this? >> >> Thanks in advance, >> Gordon >> >
