Assuming an even distribution of data in your cluster, and an even distribution across those keys by your readers, you would not need to increase RF with cluster size to increase read performance. If you have 3 nodes with RF=3, and do 3 million reads, with good distribution, each node has served 1 million read requests. If you increase to 6 nodes and keep RF=3, then each node now owns half as much data and serves only 500,000 reads. Or more meaningfully in the same time it takes to do 3 million reads under the 3 node cluster you ought to be able to do 6 million reads under the 6 node cluster since each node is just responsible for 1 million total reads.
On Mon, Mar 20, 2017 at 11:24 PM Alain Rastoul <alf.mmm....@gmail.com> wrote: > On 20/03/2017 22:05, Michael Wojcikiewicz wrote: > > Not sure if someone has suggested this, but I believe it's not > > sufficient to simply add nodes to a cluster to increase read > > performance: you also need to alter the ReplicationFactor of the > > keyspace to a larger value as you increase your cluster gets larger. > > > > ie. data is available from more nodes in the cluster for each query. > > > Yes, good point in case of cluster growth, there would be more replica > to handle same key ranges. > And also readjust token ranges : > https://cassandra.apache.org/doc/latest/operating/topo_changes.html > > SG, can you give some information (or share your code) about how you > generate your data and how you read it ? > > -- > best, > Alain > >