Hi,

We're currently in the planning stage of a new project which needs a low 
latency, persistent key/value store with a roughly 60:40 read/write split. 
We're trying to establish if Cassandra is a good fit for this and in particular 
what the hardware requirements would be to have the majority of rows cached in 
memory (other nosql platforms like Couchbase/Membase seem like a more natural 
fit but we're already reasonably familiar with cassandra and would rather stick 
with what we know if it can work). 

If anyone could help answer/clarify the following questions it would be a great 
help (all assume that row-caching is enabled for the column family).

Q. If we're using read consistency ONE does the read request get sent to all 
nodes in the replica set and the first to reply is returned (i.e. all replica 
nodes will then have that row in their cache), OR does the request only get 
sent to a single node in the replica set? If it's the latter would the same 
node generally be used for all requests to the same key or would it always be a 
random node in the replica set? (i.e. if we have multiple reads for one key in 
quick succession would this entail potentially multiple disk lookups until all 
nodes in the set have been hit?). 

Q. Related to the above, if only one node recieves the request would the client 
(hector in this case) know which node to send the request to directly or would 
there be potentially one extra network hop involved (client -> random node -> 
node with key).

Q. Is it possible to do a warm cache load of the most recently accessed keys on 
node startup or would we have to do this with a client app?

Q. With write consistency ANY is it correct that following a write request all 
nodes in the replica set will end up with that row in their cache, as well as 
on disk, once they receive the write? i.e. total cache size is 
(cache_memory_per_node * num_nodes) / num_replicas.

Q. If the cluster only has a single column family, random partitioning and no 
secondary indexes, is there a good metric for estimating how much heap space we 
would need to leave aside for everything that isn't the row-cache? Would it be 
proportional to the row-cache size or fairly constant?


Thanks,
Stephen


Stephen Henderson - Lead Developer (Onsite), Cognitive Match
stephen.hender...@cognitivematch.com | http://www.cognitivematch.com

Reply via email to