> Is the thought "it worked on the leader so applying will work on the secondary as well"?
Yes, the state machine implementation should be deterministic so that the log can be replayed in any machine. In case of hardware errors or problems like OutOfMemory, Ratis will terminate the server so that it won't join the group anymore. Admin should manually fix the underlying problem and then restart the machine. The machine will rejoin the group and catch up with the missing log transactions. > I am worried about data inconsistency when the raft log applying fails on a secondary. Note that, if there are three or more servers, the Raft algorithm makes sure the log remains consistent even if there are hardware errors in one machine (or less then the majority number of machines). In general, if there are n servers, n being odd, Ratis/Raft can tolerate (n-1)/2 machine failures. Tsz-Wo On Thu, Mar 10, 2022 at 11:13 AM Asad Awadia <[email protected]> wrote: > I am worried about data inconsistency when the raft log applying fails on > a secondary. > > Not sure if the responses answers my concern. > > > Error handling is important specifically when trying to maintain > determinism. > > Is the thought "it worked on the leader so applying will work on the > secondary as well"? > ------------------------------ > *From:* Tsz Wo Sze <[email protected]> > *Sent:* Wednesday, March 9, 2022 7:39:58 AM > *To:* [email protected] <[email protected]> > *Cc:* Lokesh Jain <[email protected]> > *Subject:* Re: Re: ApplyTransaction failure > > Hi Asad, > > > What is the behaviour when the applyTransaction fails on the followers? > > > > I am assuming applyTransaction is called on the leader when a write > request comes in and on the followers as they replicate the log. > > First of all, raft/ratis assumes that transactions can be replayed at any > machines and give the same results. If a transaction fails, says > permission denied, at a follower, it must also fail on the leader. For the > state machine implementation, it should not care about its current role > when processing transactions. It should return the same exception in this > case. If it is the leader, Ratis will return the exception back to the > client. If it is a follower, Ratis will just update the reply cache (which > is useful when becoming the leader). > > > ... If there is an error does the entire replication for that instance > block/gets stuck? > > As mentioned by Riguz, log replication and transaction processing can be > separated. In ratis, we use separated threads to implement them. > Replication is never blocked. There is a separate StateMachineUpdater > thread for applying transactions. Suppose the state machine gets stuck > when processing a transaction, Then, the StateMachineUpdater thread will > be blocked by the state machine. That request and the following requests > will stay in the request queue at the leader. They will be replicated as > usual. When the queue is full, new client requests will be rejected. > > Tsz-Wo > > > > On Wed, Mar 9, 2022 at 4:31 PM Riguz Lee <[email protected]> wrote: > > Hi Asad, > > > In addition to Lokesh's reply, regard the transaction and log > replication, my understanding is that raft only grantees that all logs will > be replicated in consistently, however the log execution might be defered > depending on the implementation, that's to say, wheter or not the log has > been executed, or execution failed, will not impact the log replication > itself. > > > I did not look into the raft paper to confirm this, but this should make > sense. With same log and same transaction applying logic, all replicas can > get same result eventually. > > > Thanks > > Riguz > > > > > > > > > > Original Email > > Sender:"Lokesh Jain"< [email protected] >; > > Sent Time:2022/3/9 14:50 > > To:"user"< [email protected] >; > > Subject:Re: ApplyTransaction failure > > > Hey Asad > You can control the behaviour of applyTransaction in the StateMachine > implementation. In case of an error, any further transaction can be failed > by the server. Application can choose whether to skip or fail. > Regards > Lokesh > > On 09-Mar-2022, at 2:14 AM, Asad Awadia <[email protected]> wrote: > > Hello, > What is the behaviour when the applyTransaction fails on the followers? > I am assuming applyTransaction is called on the leader when a write > request comes in and on the followers as they replicate the log. > For the leader we can signal back to the client but what about the > followers? If there is an error does the entire replication for that > instance block/gets stuck? > Since it is almost like a stream of logs - we probably don't want it to > skip over the failed write and continue on as that can lead to inconsistent > data end state. > Regards, > Asad > > >
