I am worried about data inconsistency when the raft log applying fails on a secondary.
Not sure if the responses answers my concern. Error handling is important specifically when trying to maintain determinism. Is the thought "it worked on the leader so applying will work on the secondary as well"? ________________________________ From: Tsz Wo Sze <[email protected]> Sent: Wednesday, March 9, 2022 7:39:58 AM To: [email protected] <[email protected]> Cc: Lokesh Jain <[email protected]> Subject: Re: Re: ApplyTransaction failure Hi Asad, > What is the behaviour when the applyTransaction fails on the followers? > > I am assuming applyTransaction is called on the leader when a write request > comes in and on the followers as they replicate the log. First of all, raft/ratis assumes that transactions can be replayed at any machines and give the same results. If a transaction fails, says permission denied, at a follower, it must also fail on the leader. For the state machine implementation, it should not care about its current role when processing transactions. It should return the same exception in this case. If it is the leader, Ratis will return the exception back to the client. If it is a follower, Ratis will just update the reply cache (which is useful when becoming the leader). > ... If there is an error does the entire replication for that instance > block/gets stuck? As mentioned by Riguz, log replication and transaction processing can be separated. In ratis, we use separated threads to implement them. Replication is never blocked. There is a separate StateMachineUpdater thread for applying transactions. Suppose the state machine gets stuck when processing a transaction, Then, the StateMachineUpdater thread will be blocked by the state machine. That request and the following requests will stay in the request queue at the leader. They will be replicated as usual. When the queue is full, new client requests will be rejected. Tsz-Wo On Wed, Mar 9, 2022 at 4:31 PM Riguz Lee <[email protected]<mailto:[email protected]>> wrote: Hi Asad, In addition to Lokesh's reply, regard the transaction and log replication, my understanding is that raft only grantees that all logs will be replicated in consistently, however the log execution might be defered depending on the implementation, that's to say, wheter or not the log has been executed, or execution failed, will not impact the log replication itself. I did not look into the raft paper to confirm this, but this should make sense. With same log and same transaction applying logic, all replicas can get same result eventually. Thanks Riguz Original Email Sender:"Lokesh Jain"< [email protected]<mailto:[email protected]> >; Sent Time:2022/3/9 14:50 To:"user"< [email protected]<mailto:[email protected]> >; Subject:Re: ApplyTransaction failure Hey Asad You can control the behaviour of applyTransaction in the StateMachine implementation. In case of an error, any further transaction can be failed by the server. Application can choose whether to skip or fail. Regards Lokesh On 09-Mar-2022, at 2:14 AM, Asad Awadia <[email protected]<mailto:[email protected]>> wrote: Hello, What is the behaviour when the applyTransaction fails on the followers? I am assuming applyTransaction is called on the leader when a write request comes in and on the followers as they replicate the log. For the leader we can signal back to the client but what about the followers? If there is an error does the entire replication for that instance block/gets stuck? Since it is almost like a stream of logs - we probably don't want it to skip over the failed write and continue on as that can lead to inconsistent data end state. Regards, Asad
