By default the TokenAwarePolicy does shuffle replicas, and it can be disabled if you want to only hit the primary replica for the token range you're querying : http://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/policies/TokenAwarePolicy.html
On Mon, Mar 27, 2017 at 9:41 AM Avi Kivity <a...@scylladb.com> wrote: > Is the driver doing the right thing by directing all reads for a given > token to the same node? If that node fails, then all of those reads will > be directed at other nodes, all oh whom will be cache-cold for the the > failed node's primary token range. Seems like the driver should distribute > reads among the all the replicas for a token, at least as an option, to > keep the caches warm for latency-sensitive loads. > > On 03/26/2017 07:46 PM, Eric Stevens wrote: > > Yes, throughput for a given partition key cannot be improved with > horizontal scaling. You can increase RF to theoretically improve > throughput on that key, but actually in this case smart clients might hold > you back, because they're probably token aware, and will try to serve that > read off the key's primary replica, so all reads would be directed at a > single node for that key. > > If you're reading at CL=QUORUM, there's a chance that increasing RF will > actually reduce performance rather than improve it, because you've > increased the total amount of work to serve the read (as well as the > write). If you're reading at CL=ONE, increasing RF will increase the > chances of falling afoul of eventual consistency. > > However that's not really a real-world scenario. Or if it is, Cassandra > is probably the wrong tool to satisfy that kind of workload. > > On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul <alf.mmm....@gmail.com> > wrote: > > On 24/03/2017 01:00, Eric Stevens wrote: > > Assuming an even distribution of data in your cluster, and an even > > distribution across those keys by your readers, you would not need to > > increase RF with cluster size to increase read performance. If you have > > 3 nodes with RF=3, and do 3 million reads, with good distribution, each > > node has served 1 million read requests. If you increase to 6 nodes and > > keep RF=3, then each node now owns half as much data and serves only > > 500,000 reads. Or more meaningfully in the same time it takes to do 3 > > million reads under the 3 node cluster you ought to be able to do 6 > > million reads under the 6 node cluster since each node is just > > responsible for 1 million total reads. > > > Hi Eric, > > I think I got your point. > In case of really evenly distributed reads it may (or should?) not make > any difference, > > But when you do not distribute well the reads (and in that case only), > my understanding about RF was that it could help spreading the load : > In that case, with RF= 4 instead of 3, with several clients accessing keys > same key ranges, a coordinator could pick up one node to handle the request > in 4 replicas instead of picking up one node in 3 , thus having > more "workers" to handle a request ? > > Am I wrong here ? > > Thank you for the clarification > > > -- > best, > Alain > > > -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com