chengjianyun commented on issue #3954: URL: https://github.com/apache/iotdb/issues/3954#issuecomment-918736384
> The uncommitted Raft logs are not persisted. Consider a scenario where node A is the leader, nodes B and C are followers, node A synchronizes the log to B and C and gets all acks, then submits and applies the log, and then returns success to the client before the next heartbeat to followers. Then there was a momentary power outage, nodes B and C restarted immediately, but node A restarted slowly. Node B and C's log are empty after restarting and recovering, but they are already a majority, so they can serve the client's read request, then this may violate linearizability. Agree that the case is a bug for IoTDB. > In fact, to ensure linearizability Raft uses read-index or lease-read. In our current implementation, direct-read is used for the leader and read-index is used for the follower. This can violate linearizability when a node outage causes a replacement node to execute the same read request. For more specific examples, you can refer to my blog. Of course, we could also use read-index on the leader, but this would undoubtedly degrade performance. Sorry that I didn't make the assumption clear. I think we are talking things under IoTDB environment. So I didn't take Read operation into consideration. As you have said, linearization the read is not a regular operation in OLAP system. And let's go back to Raft library topic. Import the Raft library doesn't mean application needs to sync all operation via the library, it depends on requirement. I don't think IoTDB needs to change current read action process flow after import the library. And learned a lot from your blog and references, thanks. > In fact, Raft's naive implementation guarantees at-least-once semantics, and to ensure linearizability semantics, uuid is generated on the client side and a map is logged on the server side to ensure that each command is executed only once. Agree the concept but which is unnecessary under IoTDB or time-order data environment. I think the `exactly-only-once` requirement here as the actions are not idempotent for all system. But operations in time-order-data or ``most of actions in IoTDB (in my understanding all of actions right now) are idempotent naturally. And again, all of these are in scope of application instead of Raft. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
