Added VOTING replica catch-up section to replicated log docs. Review: https://reviews.apache.org/r/64922/
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/925afdca Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/925afdca Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/925afdca Branch: refs/heads/master Commit: 925afdca4bd0bf45a316403e1b3e89cca16c57ed Parents: c716f70 Author: Ilya Pronin <ipro...@twopensource.com> Authored: Fri Apr 13 15:03:15 2018 -0700 Committer: Jie Yu <yujie....@gmail.com> Committed: Fri Apr 13 15:03:15 2018 -0700 ---------------------------------------------------------------------- docs/replicated-log-internals.md | 8 ++++++++ 1 file changed, 8 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/925afdca/docs/replicated-log-internals.md ---------------------------------------------------------------------- diff --git a/docs/replicated-log-internals.md b/docs/replicated-log-internals.md index 346f016..8f36a27 100644 --- a/docs/replicated-log-internals.md +++ b/docs/replicated-log-internals.md @@ -113,6 +113,14 @@ The other choice is to do automatic initialization. Our idea is: we allow a repl To do auto-initialization, if we use a single-phase protocol and allow a replica to directly transit from EMPTY status to VOTING status, we may run into a state where we cannot make progress even if all replicas are in EMPTY status initially. For example, say the quorum size is 2. All replicas are in EMPTY status initially. One replica will first set its status to VOTING because if finds all replicas are in EMPTY status. After that, neither the VOTING replica nor the EMPTY replicas can make progress. To solve this problem, we use a two-phase protocol and introduce an intermediate transient status (STARTING) between EMPTY and VOTING status. A replica in EMPTY status can transit to STARTING status if it finds all replicas are in either EMPTY or STARTING status. A replica in STARTING status can transit to VOTING status if it finds all replicas are in either STARTING or VOTING status. In that way, in our previous example, all replicas will be in STARTING status before any of them can tr ansit to VOTING status. +## Non-leading VOTING replica catch-up + +Starting with Mesos 1.5.0 it is possible to perform eventually consistent reads from a non-leading VOTING log replica. This makes possible to do additional work on non-leading framework replicas, e.g. offload some reading from a leader to standbys reduce failover time by keeping in-memory storage represented by the replicated log "hot". + +To serve eventually consistent reads a replica needs to perform _catch-up_ to recover the latest log state in a manner similar to how it is done during [EMPTY replica recovery](#catch-up). After that the recovered positions can be replayed without fear of seeing "holes". + +A truncation can take place during the non-leading replica catch-up. The replica may try to fill the truncated position if truncation happens after the replica has recovered _begin_ and _end_ positions, which may lead to producing inconsistent data during log replay. In order to protect against it we use a special tombstone flag that signals to the replica that the position was truncated and _begin_ needs to be adjusted. The replica is not blocked from truncations during or after catching-up, which means that the user may need to retry the catch-up procedure if positions that were recovered became truncated during log replay. + ## Future work Currently, replicated log does not support dynamic quorum size change, also known as _reconfiguration_. Supporting reconfiguration would allow us more easily to add, move or swap hosts for replicas. We plan to support reconfiguration in the future.