Re: Quorum and Datacenter loss

Dave Viner Sun, 12 Dec 2010 15:18:28 -0800

I think there's a flaw in your logic.  Take the following scenario:
- you use QUORUM for reads and QUROUM for writes
- you have 2 datacenters (DC1, DC2), with 3 servers in each (so 6 nodes
total).
- you set replication factor to 3
- you use RackAwareStrategy

So, you have DC1-S1, DC1-S2, DC1-S3, DC2-S4, DC2-S5, DC2-S6 as nodes. The
quorum count is N/2 +1.  Since RF=3, the quorum is 2.

When setup correctly, a write to any node for key K causes the data to be
replicated to 3 nodes.  Say it's DC1-S1, DC1-S2, DC2-S4.

When you read key K from any node, the coordinator node (the one you ask),
will attempt to contact DC1-S1, DC1-S2, DC2-S4, since those are the nodes
known to hold the value for key K.  The quorum required is 2 nodes, not all
3.  So, as long as the coordinator hears back from 2 of the nodes, it will
return.

In your example, imagine DC2 goes dark.  As long as DC1-S1 and DC1-S2 are
still responding, your read query will succeed.

Of course, you are on the knife-edge.  If you lost any machine in DC1 while
DC2 is out, then you would not be able to satisfy QUORUM reads or writes.

Note that in 0.7, the new LOCAL_QUORUM on reads would solve the issue, I
believe.

Dave Viner

On Sun, Dec 12, 2010 at 9:49 AM, Peter Schuller <peter.schul...@infidyne.com
> wrote:

> > Thanks a lot Peter.   So basically we would need to choose a
> > consistency other than QUORUM.    I think in our case consistency is
> > not necessarily an issue since our data is write-once, read-many
> > (immutable data).   I suppose having a replication factor of 4 would
> > result in two nodes in each datacenter having a copy of the data.   If
> > there's a flaw in my logic, please let me know : ]
>
> It would, but note that if you're writing at consistency level ONE
> only a single copy of the data is required to exist before your write
> is ACK:ed back to the client (but it will still be replicated).
>
> --
> / Peter Schuller
>

Re: Quorum and Datacenter loss

Reply via email to