UPDATE to my suggestion is below.


On 6/16/2011 5:50 PM, Ryan King wrote:
On Thu, Jun 16, 2011 at 2:12 PM, AJ<a...@dude.podzone.net>  wrote:
On 6/16/2011 2:37 PM, Ryan King wrote:
On Thu, Jun 16, 2011 at 1:05 PM, AJ<a...@dude.podzone.net>    wrote:
<snip>
The Cassandra consistency model is pretty elegant and this type of
approach breaks that elegance in many ways. It would also only really be
useful when the value has a high probability of being updated between a
node
going down and the value being read.
I'm not sure what you mean.  A node can be down for days during which
time
the value can be updated.  The intention is to use the nodes available
even
if they fall below the RF.  If there is only 1 node available for
accepting
a replica, that should be enough given the conditions I stated and
updated
below.
If this is your constraint, then you should just use CL.ONE.

My constraint is a CL = "All Available".  So, CL.ONE will not work.
That's a solution, not a requirement. What's your requirement?

Ok. And this updates my suggestion removing the need for ALL_AVAIL. This adds logic to cope with unavailable nodes and still achieve consistency for a specific situation.

The general requirement is to completely eliminate read failures for reads specifying CL = ALL for values that have been subject to a specific data update pattern. The specific data update pattern consists of a value that has been updated (or added) in the face of one or more, but less than R, unavailable replica nodes (at least 1 replica node is available). If a particular data value (column value) is updated after the latest down node, this implies this new value is independent of any replica values that are currently unavailable. Therefore, in this situation, the number of available replicas is irrelevant. After querying all *available* replica nodes, the value with the latest timestamp is consistent if that timestamp is > the timestamp of the last replica node that became unavailable.


<snip>
Well, theoretically, of course; that's the nature of distributed systems.
  But, Cass does indeed make that determination when it counts the number
available replica nodes before it decides if enough replica nodes are
available.  But, this is obvious to you I'm sure so maybe I don't understand
your statement.
Consider this scenario: given nodes, A, B and C and A thinks C is down
but B thinks C is up. What do you do? Remember, A doesn't know that B
thinks C is up, it only knows its own state.


What kind of network configuration would have this kind of scenario? This method only applies withing a data center which should be OK since other replication across data centers seems to be mostly for fault tolerance... but I will have to think about this.

-ryan


Reply via email to