On 20/03/2017 02:35, S G wrote:
The problem is also related to select * which is considered bad practice
with most databases...
tells me to avoid preparing select queries if I expect a change of
columns in my table down the road.
I did some more testing to see if my client machines were the bottleneck.
For a 6-node Cassandra cluster (each VM having 8-cores), I got 26,000
reads/sec for all of the following:
1) Client nodes:1, Threads: 60
2) Client nodes:3, Threads: 180
3) Client nodes:5, Threads: 300
4) Client nodes:10, Threads: 600
5) Client nodes:20, Threads: 1200
So adding more client nodes or threads to those client nodes is not
having any effect.
I am suspecting Cassandra is simply not allowing me to go any further.
> Primary keys for my schema are:
> PRIMARY KEY((name, phone), age)
> name: text
> phone: int
> age: int
Yes with such a PK data must be spread on the whole cluster (also taking
into account the partitioner), strange that the throughput doesn't scale.
I guess you also have verified that you select data randomly?
May be you could have a look at the system traces to see the query plan
for some requests:
If you are on a test cluster you can truncate the tables before
(truncate system_traces.sessions; and truncate system_traces.events;),
run a test then select * from system_traces.events
where session_id = xxxx
xxx being one of the sessions you pick in trace.sessions.
Try to see if you are not always hitting the same nodes.