By default the TokenAwarePolicy does shuffle replicas, and it can be
disabled if you want to only hit the primary replica for the token range
you're querying :
http://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/policies/TokenAwarePolicy.html

On Mon, Mar 27, 2017 at 9:41 AM Avi Kivity <a...@scylladb.com> wrote:

> Is the driver doing the right thing by directing all reads for a given
> token to the same node?  If that node fails, then all of those reads will
> be directed at other nodes, all oh whom will be cache-cold for the the
> failed node's primary token range.  Seems like the driver should distribute
> reads among the all the replicas for a token, at least as an option, to
> keep the caches warm for latency-sensitive loads.
>
> On 03/26/2017 07:46 PM, Eric Stevens wrote:
>
> Yes, throughput for a given partition key cannot be improved with
> horizontal scaling.  You can increase RF to theoretically improve
> throughput on that key, but actually in this case smart clients might hold
> you back, because they're probably token aware, and will try to serve that
> read off the key's primary replica, so all reads would be directed at a
> single node for that key.
>
> If you're reading at CL=QUORUM, there's a chance that increasing RF will
> actually reduce performance rather than improve it, because you've
> increased the total amount of work to serve the read (as well as the
> write).  If you're reading at CL=ONE, increasing RF will increase the
> chances of falling afoul of eventual consistency.
>
> However that's not really a real-world scenario.  Or if it is, Cassandra
> is probably the wrong tool to satisfy that kind of workload.
>
> On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul <alf.mmm....@gmail.com>
> wrote:
>
> On 24/03/2017 01:00, Eric Stevens wrote:
> > Assuming an even distribution of data in your cluster, and an even
> > distribution across those keys by your readers, you would not need to
> > increase RF with cluster size to increase read performance.  If you have
> > 3 nodes with RF=3, and do 3 million reads, with good distribution, each
> > node has served 1 million read requests.  If you increase to 6 nodes and
> > keep RF=3, then each node now owns half as much data and serves only
> > 500,000 reads.  Or more meaningfully in the same time it takes to do 3
> > million reads under the 3 node cluster you ought to be able to do 6
> > million reads under the 6 node cluster since each node is just
> > responsible for 1 million total reads.
> >
> Hi Eric,
>
> I think I got your point.
> In case of really evenly distributed  reads it may (or should?) not make
> any difference,
>
> But when you do not distribute well the reads (and in that case only),
> my understanding about RF was that it could help spreading the load :
> In that case, with RF= 4 instead of 3,  with several clients accessing keys
> same key ranges, a coordinator could pick up one node to handle the request
> in 4 replicas instead of picking up one node in 3 , thus having
> more "workers" to handle a request ?
>
> Am I wrong here ?
>
> Thank you for the clarification
>
>
> --
> best,
> Alain
>
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Reply via email to