Typically, when a read is submitted to C*, it may complete  with  …
1. No errors & returns expected data
2. Errors out with UnavailableException
3. No error & returns zero rows on first attempt, but returned on subsequent 
runs.

The third scenario happens as a result of cluster entropy specially during 
unexpected outages affecting on-premise or cloud infrastructures.

Typical scenario …
a) Multiple nodes fail in the cluster
b) Node replaced via bootstrapping
c) Row is in Cassandra, but client hits nodes that do not have the data yet. 
Gets zero rows. Row is retrieved on third or forth attempts and read repairs 
takes care of it.
d) Eventually, repair is run and issue is fixed.

Digging in Cassandra metrics, I’ve found ‘cassandra.unavailables.count’. Looks 
like this metrics captures scenario ' UnavailableException’, however.

I have also read the Yelp article describing a metric they called 
‘underreplicated keyspaces’. These are keyspace ranges that will fail to 
satisfy reads/write at a certain CL due to insufficient endpoints. If my 
understanding is correct, this is also measuring scenario 2. 

Tying to find a metric to capture scenario 3 above. Is this possible at all?



----------------
Thank you

Reply via email to