First of all, Thanks for your help ! :)
Here is some details :
> With RF=N=2 your essentially testing a single machine locally which isnt the
> best indicator long term
I will test with more nodes, (4 with RF = 2) but for now I'm limited to 2
nodes for non technical reason ...
> Well, first off you shouldn't run stress tool on the node your testing. Give
> it its own box.
I performed the test in a new Keyspace in order to have a clear dataset.
> the 2nd query since its returning 10x the data and there will be more to go
> through within the partition
I configured cassandra-stress in a way of each user has only one bucket so the
amount of data is the same in the both case. ("select * from buckets where name
= ? and tenantid = ? limit 1" and "select * from owner_to_buckets where owner
= ? and tenantid = ? limit 10").
Does cassandra perform extra read when the limit is bigger than the available
data (even if the partition key contains only one single value in the
clustering column) ?
If the amount of data is the same, how can we explain the difference of CPU
consumption?
Regards,
Eric
________________________________________
De : Chris Lohfink [[email protected]]
Date d'envoi : mardi 23 septembre 2014 19:23
À : [email protected]
Objet : Re: CPU consumption of Cassandra
Well, first off you shouldn't run stress tool on the node your testing. Give
it its own box.
With RF=N=2 your essentially testing a single machine locally which isnt the
best indicator long term (optimizations available when reading data thats local
to the node). 80k/sec on a system is pretty good though, your probably seeing
slower on the 2nd query since its returning 10x the data and there will be more
to go through within the partition. 42k/sec is still acceptable imho since
these are smaller boxes. You are probably seeing high CPU because the system
is doing a lot :)
If you want to get more out of these systems can do some tuning probably,
enable trace to see whats actually the bottleneck.
Collections will very likely hurt more then help.
---
Chris Lohfink
On Sep 23, 2014, at 9:39 AM, Leleu Eric
<[email protected]<mailto:[email protected]>> wrote:
I tried to run “cassandra-stress” on some of my table as proposed by Jake
Luciani.
For a simple table, this tool is able to perform 80000 read op/s with a few CPU
consumption if I request the table by the PK(name, tenanted)
Ex :
TABLE :
CREATE TABLE IF NOT EXISTS buckets (tenantid varchar,
name varchar,
owner varchar,
location varchar,
description varchar,
codeQuota varchar,
creationDate timestamp,
updateDate timestamp,
PRIMARY KEY (name, tenantid));
QUERY : select * from buckets where name = ? and tenantid = ? limit 1;
TOP output for 900 threads on cassandra-stress :
top - 13:17:09 up 173 days, 21:54, 4 users, load average: 11.88, 4.30, 2.76
Tasks: 272 total, 1 running, 270 sleeping, 0 stopped, 1 zombie
Cpu(s): 71.4%us, 14.0%sy, 0.0%ni, 13.1%id, 0.0%wa, 0.0%hi, 1.5%si, 0.0%st
Mem: 98894704k total, 96367436k used, 2527268k free, 15440k buffers
Swap: 0k total, 0k used, 0k free, 88194556k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25857 root 20 0 29.7g 1.5g 12m S 693.0 1.6 38:45.58 java <==
Cassandra-stress
29160 cassandr 20 0 16.3g 4.8g 10m S 1.3 5.0 44:46.89 java <== Cassandra
Now, If I run another query on a table that provides a list of buckets
according to the owner, the number of op/s is divided by 2 (42000 op/s) and
CPU consumption grow UP.
Ex :
TABLE :
CREATE TABLE IF NOT EXISTS owner_to_buckets (tenantid varchar,
name varchar,
owner varchar,
location varchar,
description varchar,
codeQuota varchar,
creationDate timestamp,
updateDate timestamp,
PRIMARY KEY ((owner, tenantid), name));
QUERY : select * from owner_to_buckets where owner = ? and tenantid = ? limit
10;
TOP output for 4 threads on cassandra-stress:
top - 13:49:16 up 173 days, 22:26, 4 users, load average: 1.76, 1.48, 1.17
Tasks: 273 total, 1 running, 271 sleeping, 0 stopped, 1 zombie
Cpu(s): 26.3%us, 8.0%sy, 0.0%ni, 64.7%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Mem: 98894704k total, 97512156k used, 1382548k free, 14580k buffers
Swap: 0k total, 0k used, 0k free, 90413772k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29160 cassandr 20 0 13.6g 4.8g 37m S 186.7 5.1 62:26.77 java <== Cassandra
50622 root 20 0 28.8g 469m 12m S 102.5 0.5 0:45.84 java <==
Cassandra-stress
TOP output for 271 threads on cassandra-stress:
top - 13:57:03 up 173 days, 22:34, 4 users, load average: 4.67, 1.76, 1.25
Tasks: 272 total, 1 running, 270 sleeping, 0 stopped, 1 zombie
Cpu(s): 81.5%us, 14.0%sy, 0.0%ni, 3.1%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st
Mem: 98894704k total, 94955936k used, 3938768k free, 15892k buffers
Swap: 0k total, 0k used, 0k free, 85993676k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29160 cassandr 20 0 13.6g 4.8g 38m S 430.0 5.1 82:31.80 java <== Cassandra
50622 root 20 0 29.1g 2.3g 12m S 343.4 2.4 17:51.22 java <==
Cassandra-stress
I have 4 tables with a composed PRIMARY KEY (two of them has 4 entries : 2 for
the partition key, one for cluster column and one for sort column)
Two of these tables are frequently read with the partition key because we want
to list data of a given user, this should explain my CPU load according to the
simple test done with Cassandra-stress …
How can I avoid this?
Collections could be an option but the number of data per user is not limited
and can easily exceed 200 entries. According to the Cassandra documentation,
collections have a size limited to 64KB. So it is probably not a solution in my
case. ☹
Regards,
Eric
De : Chris Lohfink [mailto:[email protected]]
Envoyé : lundi 22 septembre 2014 22:03
À : [email protected]<mailto:[email protected]>
Objet : Re: CPU consumption of Cassandra
Its going to depend a lot on your data model but 5-6k is on the low end of what
I would expect. N=RF=2 is not really something I would recommend. That said
93GB is not much data so the bottleneck may exist more in your data model,
queries, or client.
What profiler are you using? The cpu on the select/read is marked as RUNNABLE
but its really more of a wait state that may throw some profilers off, it may
be a red haring.
---
Chris Lohfink
On Sep 22, 2014, at 11:39 AM, Leleu Eric
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I’m currently testing Cassandra 2.0.9 (and since the last week 2.1) under some
read heavy load…
I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and 8
Cores.
I have around 93GB of data per node (one Disk of 300GB with SAS interface and a
Rotational Speed of 10500)
I have 300 active client threads and they request the C* nodes with a
Consitency level set to ONE (I’m using the CQL datastax driver).
During my tests I saw a lot of CPU consumption (70% user / 6%sys / 4% iowait /
20%idle).
C* nodes respond to around 5000 op/s (sometime up to 6000op/s)
I try to profile a node and at the first look, 60% of the CPU is passed in the
“sun.nio.ch<http://sun.nio.ch/>” package. (SelectorImpl.select or Channel.read)
I know that Benchmark results are highly dependent of the Dataset and use
cases, but according to my point of view this CPU consumption is normal
according to the load.
Someone can confirm that point ?
According to my Hardware configuration, can I expect to have more than 6000
read op/s ?
Regards,
Eric
________________________________
Ce message et les pièces jointes sont confidentiels et réservés à l'usage
exclusif de ses destinataires. Il peut également être protégé par le secret
professionnel. Si vous recevez ce message par erreur, merci d'en avertir
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant
être assurée sur Internet, la responsabilité de Worldline ne pourra être
recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soient faits pour maintenir cette transmission exempte de tout virus,
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne
saurait être recherchée pour tout dommage résultant d'un virus transmis.
This e-mail and the documents attached are confidential and intended solely for
the addressee; it may also be privileged. If you receive this e-mail in error,
please notify the sender immediately and destroy it. As its integrity cannot be
secured on the Internet, the Worldline liability cannot be triggered for the
message content. Although the sender endeavours to maintain a computer
virus-free network, the sender does not warrant that this transmission is
virus-free and will not be liable for any damages resulting from any virus
transmitted.
________________________________
Ce message et les pièces jointes sont confidentiels et réservés à l'usage
exclusif de ses destinataires. Il peut également être protégé par le secret
professionnel. Si vous recevez ce message par erreur, merci d'en avertir
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant
être assurée sur Internet, la responsabilité de Worldline ne pourra être
recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soient faits pour maintenir cette transmission exempte de tout virus,
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne
saurait être recherchée pour tout dommage résultant d'un virus transmis.
This e-mail and the documents attached are confidential and intended solely for
the addressee; it may also be privileged. If you receive this e-mail in error,
please notify the sender immediately and destroy it. As its integrity cannot be
secured on the Internet, the Worldline liability cannot be triggered for the
message content. Although the sender endeavours to maintain a computer
virus-free network, the sender does not warrant that this transmission is
virus-free and will not be liable for any damages resulting from any virus
transmitted.
Ce message et les pièces jointes sont confidentiels et réservés à l'usage
exclusif de ses destinataires. Il peut également être protégé par le secret
professionnel. Si vous recevez ce message par erreur, merci d'en avertir
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant
être assurée sur Internet, la responsabilité de Worldline ne pourra être
recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soient faits pour maintenir cette transmission exempte de tout virus,
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne
saurait être recherchée pour tout dommage résultant d'un virus transmis.
This e-mail and the documents attached are confidential and intended solely for
the addressee; it may also be privileged. If you receive this e-mail in error,
please notify the sender immediately and destroy it. As its integrity cannot be
secured on the Internet, the Worldline liability cannot be triggered for the
message content. Although the sender endeavours to maintain a computer
virus-free network, the sender does not warrant that this transmission is
virus-free and will not be liable for any damages resulting from any virus
transmitted.