Yes, I think for my case, at least two nodes need to be contacted to get the
full set of data.
But another thing comes up about dynamic snitch. It's the wrapped snitch and
enabled by default and it'll choose the fastest/closest node to read data from.
Another post is about this.
The thing is why it's still emphasis only one replica to read data from. Below
is from the post:
To begin, let’s first answer the most obvious question: what is dynamic
snitching? To understand this, we’ll first recall what a snitch does. A
snitch’s function is to determine which datacenters and racks are both written
to and read from. So, why would that be ‘dynamic?’ This comes into play on the
read side only (there’s nothing to be done for writes since we send them all
and then block to until the consistency level is achieved.) When doing reads
however, Cassandra only asks one node for the actual data, and, depending on
consistency level and read repair chance, it asks the remaining replicas for
checksums only. This means that it has a choice of however many replicas exist
to ask for the actual data, and this is where the dynamic snitch goes to work.
Since only one replica is sending the full data we need, we need to chose the
best possible replica to ask, since if all we get back is checksums we have
nothing useful to return to the user. The dynamic snitch handles this task by
monitoring the performance of reads from the various replicas and choosing the
best one based on this history.
Sent from my iPad
> On Sep 20, 2016, at 00:03, Ben Slater <ben.sla...@instaclustr.com> wrote:
> If your read operation requires data from multiple partitions and the
> partitions are spread across multiple nodes then the coordinator has the job
> of contacting the multiple nodes to get the data and return to the client.
> So, in your scenario, if you did a select * from table (with no where clause)
> the coordinator would need to contact and execute a read on at least one
> other node to satisfy the query.
>> On Tue, 20 Sep 2016 at 14:50 Jun Wu <wuxiaomi...@hotmail.com> wrote:
>> Hi Ben,
>> Thanks for the quick response.
>> It's clear about the example for single row/partition. However, normally
>> data are not single row. Then for this case, I'm still confused.
>> The link above gives an example of 10 nodes cluster with RF = 3. But the
>> figure and the words in the post shows that the coordinator only
>> contact/read data from one replica, and operate read repair for the left
>> Also, how could read accross all nodes in the cluster?
>> From: ben.sla...@instaclustr.com
>> Date: Tue, 20 Sep 2016 04:18:59 +0000
>> Subject: Re: Question about replica and replication factor
>> To: email@example.com
>> Each individual read (where a read is a single row or single partition) will
>> read from one node (ignoring read repairs) as each partition will be
>> contained entirely on a single node. To read the full set of data, reads
>> would hit at least two nodes (in practice, reads would likely end up being
>> distributed across all the nodes in your cluster).
>> On Tue, 20 Sep 2016 at 14:09 Jun Wu <wuxiaomi...@hotmail.com> wrote:
>> Hi there,
>> I have a question about the replica and replication factor.
>> For example, I have a cluster of 6 nodes in the same data center.
>> Replication factor RF is set to 3 and the consistency level is default 1.
>> According to this calculator http://www.ecyrd.com/cassandracalculator/,
>> every node will store 50% of the data.
>> When I want to read all data from the cluster, how many nodes should I
>> read from, 2 or 1? Is it 2, because each node has half data? But in the
>> calculator it show 1: You are really reading from 1 node every time.
>> Any suggestions? Thanks!
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798