Good morning all.

Hypothetical Setup:
1 data center
RF = 3
Total nodes > 3

Problem:
Suppose I need maximum consistency for one critical operation; thus I specify CL = ALL for reads. However, this will fail if only 1 replica endpoint is down. I don't see why this fail is necessary all of the time since the data could have been updated since the node became unavailable and it's data is old anyways. If only one node goes down and it has the key I need, then the app is not 100% available and it could take some time making the node available again.

Proposal:
If all of the *available* replica nodes answer the read operation and the latest value timestamp is clearly AFTER the time the down node became unavailable, then this situation can meet the requirements for *near* 100% consistency since the value in the down node would be outdated anyway. Clearly, the value was updated some time *after* the node went down or unavailable. This way, you can have max availability when using read with CL.ALL... or something CL close in meaning to ALL.

I say "near" 100% consistency to leave room for some situation where the unavailable node was only unavailable to the coordinating node for some reason such as a network issue and thus still received an update by some other route after it "appeared" unavailable to the current coordinating node. In a situation like this, there is a chance the read will still not return the latest value. So, this will not be truly 100% consistent which CL.ALL guarantees. However, I think this logic could justify a new consistency level slightly lower than ALL, such as ALL_AVAIL.

What do you think? Is my logic correct? Is there a conflict with the architecture or base principles? This fits with the tunable consistency principle for sure.

Thanks for listening


Reply via email to