Fill in Replication, Tuneable Consistency sections
Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b1edbd12 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b1edbd12 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b1edbd12 Branch: refs/heads/trunk Commit: b1edbd12146b483516eaf3a90745ac664f46d609 Parents: 62e3d7d Author: Tyler Hobbs <tylerlho...@gmail.com> Authored: Wed Jun 15 17:23:11 2016 -0500 Committer: Sylvain Lebresne <sylv...@datastax.com> Committed: Thu Jun 16 12:23:52 2016 +0200 ---------------------------------------------------------------------- doc/source/architecture.rst | 96 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 94 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b1edbd12/doc/source/architecture.rst ---------------------------------------------------------------------- diff --git a/doc/source/architecture.rst b/doc/source/architecture.rst index 37a0027..3f8a8ca 100644 --- a/doc/source/architecture.rst +++ b/doc/source/architecture.rst @@ -43,12 +43,104 @@ Token Ring/Ranges Replication ^^^^^^^^^^^ -.. todo:: todo +The replication strategy of a keyspace determines which nodes are replicas for a given token range. The two main +replication strategies are :ref:`simple-strategy` and :ref:`network-topology-strategy`. + +.. _simple-strategy: + +SimpleStrategy +~~~~~~~~~~~~~~ + +SimpleStrategy allows a single integer ``replication_factor`` to be defined. This determines the number of nodes that +should contain a copy of each row. For example, if ``replication_factor`` is 3, then three different nodes should store +a copy of each row. + +SimpleStrategy treats all nodes identically, ignoring any configured datacenters or racks. To determine the replicas +for a token range, Cassandra iterates through the tokens in the ring, starting with the token range of interest. For +each token, it checks whether the owning node has been added to the set of replicas, and if it has not, it is added to +the set. This process continues until ``replication_factor`` distinct nodes have been added to the set of replicas. + +.. _network-topology-strategy: + +NetworkTopologyStrategy +~~~~~~~~~~~~~~~~~~~~~~~ + +NetworkTopologyStrategy allows a replication factor to be specified for each datacenter in the cluster. Even if your +cluster only uses a single datacenter, NetworkTopologyStrategy should be prefered over SimpleStrategy to make it easier +to add new physical or virtual datacenters to the cluster later. + +In addition to allowing the replication factor to be specified per-DC, NetworkTopologyStrategy also attempts to choose +replicas within a datacenter from different racks. If the number of racks is greater than or equal to the replication +factor for the DC, each replica will be chosen from a different rack. Otherwise, each rack will hold at least one +replica, but some racks may hold more than one. Note that this rack-aware behavior has some potentially `surprising +implications <https://issues.apache.org/jira/browse/CASSANDRA-3810>`_. For example, if there are not an even number of +nodes in each rack, the data load on the smallest rack may be much higher. Similarly, if a single node is bootstrapped +into a new rack, it will be considered a replica for the entire ring. For this reason, many operators choose to +configure all nodes on a single "rack". Tunable Consistency ^^^^^^^^^^^^^^^^^^^ -.. todo:: todo +Cassandra supports a per-operation tradeoff between consistency and availability through *Consistency Levels*. +Essentially, an operation's consistency level specifies how many of the replicas need to respond to the coordinator in +order to consider the operation a success. + +The following consistency levels are available: + +``ONE`` + Only a single replica must respond. + +``TWO`` + Two replicas must respond. + +``THREE`` + Three replicas must respond. + +``QUORUM`` + A majority (n/2 + 1) of the replicas must respond. + +``ALL`` + All of the replicas must respond. + +``LOCAL_QUORUM`` + A majority of the replicas in the local datacenter (whichever datacenter the coordinator is in) must respond. + +``EACH_QUORUM`` + A majority of the replicas in each datacenter must respond. + +``LOCAL_ONE`` + Only a single replica must respond. In a multi-datacenter cluster, this also gaurantees that read requests are not + sent to replicas in a remote datacenter. + +``ANY`` + A single replica may respond, or the coordinator may store a hint. If a hint is stored, the coordinator will later + attempt to replay the hint and deliver the mutation to the replicas. This consistency level is only accepted for + write operations. + +Write operations are always sent to all replicas, regardless of consistency level. The consistency level simply +controls how many responses the coordinator waits for before responding to the client. + +For read operations, the coordinator generally only issues read commands to enough replicas to satisfy the consistency +level. There are a couple of exceptions to this: + +- Speculative retry may issue a redundant read request to an extra replica if the other replicas have not responded + within a specified time window. +- Based on ``read_repair_chance`` and ``dclocal_read_repair_chance`` (part of a table's schema), read requests may be + randomly sent to all replicas in order to repair potentially inconsistent data. + +Picking Consistency Levels +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is common to pick read and write consistency levels that are high enough to overlap, resulting in "strong" +consistency. This is typically expressed as ``W + R > RF``, where ``W`` is the write consistency level, ``R`` is the +read consistency level, and ``RF`` is the replication factor. For example, if ``RF = 3``, a ``QUORUM`` request will +require responses from at least two of the three replicas. If ``QUORUM`` is used for both writes and reads, at least +one of the replicas is guaranteed to participate in *both* the write and the read request, which in turn guarantees that +the latest write will be read. In a multi-datacenter environment, ``LOCAL_QUORUM`` can be used to provide a weaker but +still useful guarantee: reads are guaranteed to see the latest write from within the same datacenter. + +If this type of strong consistency isn't required, lower consistency levels like ``ONE`` may be used to improve +throughput, latency, and availability. Storage Engine --------------