Hi Folks, I'm trying to wrap my brain around how one would configure a Cassandra cluster to support consistent reads in various cluster failure states. (IE: Reads must see the most recent write.)
Gracefully handling a single node failure seems easy enough. Use QUORUM (RF/2+1) for reads and writes. This is all fair and good as long as you only lose < 50% of your cluster. If your cassandra nodes span two switches, or two PDUs, or two VM host servers, it seems we can't make this guarantee. I'm curious if it's possible to configure such that a coordinator node selects a replica node based on a hash of the key. This sounds like a function of the snitch, but the snitch configurations seem to revolve around physical/geo parameters. If replica selection for a given key was consistent and "sticky" - it seems we could achieve /mostly/ consistent reads at lower consistency levels during a failure of 50% without failing writes. (Save for cases of flapping.) Is this in the stars? If not, are there any client libs that implement this by (hashed) selection of a coordinator node that is a replica for the requested key? (It seems pycassa does not.) Timmy