[GitHub] [iotdb] LebronAl edited a comment on issue #3954: Integrate Apache Ratis to help manage Raft status

GitBox Mon, 13 Sep 2021 06:04:43 -0700


LebronAl edited a comment on issue #3954:
URL: https://github.com/apache/iotdb/issues/3954#issuecomment-918163937



   > What do you mean by "strictly speaking it does not guarantee 
linearizability"? In which case, IoTDB doesn't guarantee the linearizability? 
In my opinion, it doesn't matter IoTDB or other products, if you are using 
Raft, that means you need the linearizability which is what raft provided. If 
IoTDB doesn't guarantee the linearizability, I think it's a bug. Right now, I 
didn't find any of the case.
   
   The devil of consensus lies in the corner case. So let me give you three 
examples that I know so far that may violate linearizability.
   1. The uncommitted Raft logs are not persisted. Consider a scenario where 
node A is the leader, nodes B and C are followers, node A synchronizes the log 
to B and C and gets all acks, then submits and applies the log, and then 
returns success to the client before the next heartbeat to followers. Then 
there was a momentary power outage, nodes B and C restarted immediately, but 
node A restarted slowly. Node B and C's log are empty after restarting and 
recovering, but they are already a majority, so they can serve the client's 
read request, then this may violate linearizability. Of course, this is 
strictly an implementation bug and can be fixed. But even if the bug is fixed, 
you can see that our raft log's  serialization buffer refers to the 
implementation of the stand-alone WAL, it does not write to the disk and call 
async every time a log is written, which ensures that the performance will not 
be limited by the IOPS of the disk. But this may also cause the corner case 
mentioned abov
 e. This is the trade-off between performance and safety.
   2. In fact, to ensure linearizability Raft uses read-index or lease-read. In 
our current implementation, direct-read is used for the leader and read-index 
is used for the follower. This can violate linearizability when a node outage 
causes a replacement node to execute the same read request. For more specific 
examples, you can refer to my 
[blog](https://tanxinyu.work/consistency-and-consensus/). Of course, we could 
also use read-index on the leader, but this would undoubtedly degrade 
performance.
   3. In fact, Raft's naive implementation guarantees at-least-once semantics, 
and to ensure linearizability semantics, uuid is generated on the client side 
and a map is logged on the server side to ensure that each command is executed 
only once. You can refer to section 6.3 of [Raft's PhD 
thesis](https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf) and 
dragonboat's discussion on 
[Zhihu](https://www.zhihu.com/question/278551592).There is no doubt that such 
an implementation will also affect performance.
   
   >Application(Raft user) takes care of log apply, so it doesn't matter how 
Raft implemented.
   
   I hope so~Maybe we need to make a detailed investigation on `Ratis`
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [iotdb] LebronAl edited a comment on issue #3954: Integrate Apache Ratis to help manage Raft status

Reply via email to