RE: ConsitencyLevel and Mutations : Behaviour if the update of the commitlog fails

2017-09-19 Thread Leleu Eric
OK, thanks you.

De : kurt greaves [mailto:k...@instaclustr.com]
Envoyé : mardi 19 septembre 2017 06:35
À : User
Objet : Re: ConsitencyLevel and Mutations : Behaviour if the update of the 
commitlog fails


​Does the coordinator "cancel" the mutation on the "committed" nodes (and how)?
No. Those mutations are applied on those nodes.
 Is it an heuristic case where two nodes have the data whereas they shouldn't 
and we hope that HintedHandoff will replay the mutation ?
Yes. But really you should make sure you recover from this error in your 
client. Hinted handoff might work, but you have no way of knowing if it has 
taken place so if ALL is important you should retry/resolve the failed query 
accordingly.

!!!*
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.!!!"


ConsitencyLevel and Mutations : Behaviour if the update of the commitlog fails

2017-09-18 Thread Leleu Eric
Hi Cassandra users,


I have a question about the ConsistencyLevel and the MUTATION operation.
According to the write path documentation, the first action executed by a
replica node is to write the mutation into the commitlog, the mutation is
ACK only if this action is performed.

I suppose that this commitlog write may fail for one node (even if this
node is seen as Up and Nominal by the coordinator)

So my question is :  what happend if on a RF of 3 and a CL=ALL, a commitlog
write fails and the 2 others succeed? Does the coordinator "cancel" the
mutation on the "committed" nodes (and how)? Is it an heuristic case where
two nodes have the data whereas they shouldn't and we hope that
HintedHandoff will replay the mutation ?



Thanks you in advance for your answers in order to improve my Cassandra
understanding :)

Regards,
Eric


RE: Max Capacity per Node

2016-11-24 Thread Leleu Eric
Hi,

I’m not a Cassandra expert but according this reference : 
http://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningHardware.html#planningHardware__capacity-per-node
You already reached (even exceeded) the recommended limit for HDD.

As usual, I guess the maximum limits depend of your use case, data model and 
workload…


Regards,
Eric

De : Shalom Sagges [mailto:shal...@liveperson.com]
Envoyé : jeudi 24 novembre 2016 14:48
À : user@cassandra.apache.org
Objet : Max Capacity per Node

Hi Everyone,

I have a 24 node cluster (12 in each DC) with a capacity of 3.3 TB per node for 
the data directory.
I'd like to increase the capacity per node.
Can anyone tell what is the maximum recommended capacity a node can use?
The disks we use are HDD, not SSD.

Thanks!

[https://signature.s3.amazonaws.com/2015/lp_logo.png]

Shalom Sagges

DBA

T: +972-74-700-4035

[https://signature.s3.amazonaws.com/2015/LinkedIn.png]

[https://signature.s3.amazonaws.com/2015/Twitter.png]

[https://signature.s3.amazonaws.com/2015/Facebook.png]


We Create Meaningful Connections


[https://signature.s3.amazonaws.com/2015/banners/idc-email-sig.png]



This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein.
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

!!!*
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.!!!"


RE: CsvReporter not spitting out metrics in cassandra

2016-02-25 Thread Leleu Eric
Hi,

I configured this reporter recently thought the Apache Cassandra v2.1.x and I 
had no troubles.
Here is some points to check :

-  The directory “/etc/dse/Cassandra” has to be in the classpath (I’m 
not a DSE user so I don’t know if it is already the case.)

-  If the CVSReporter fails to start (rights issue on the output 
directory?), you should have some logs with ERROR level into your Cassandra log 
files.

Eric

De : Vikram Kone [mailto:vikramk...@gmail.com]
Envoyé : jeudi 25 février 2016 21:41
À : user@cassandra.apache.org
Objet : CsvReporter not spitting out metrics in cassandra

Hi,
I have added the following file on my cassandra node

/etc/dse/cassandra/metrics-reporter-config.yaml
csv:
  -
outdir: '/mnt/cassandra/metrics'
period: 10
timeunit: 'SECONDS'
predicate:
  color: "white"
  useQualifiedName: true
  patterns:
- "^org.apache.cassandra.metrics.Cache.+"
- "^org.apache.cassandra.metrics.ClientRequest.+"
- "^org.apache.cassandra.metrics.CommitLog.+"
- "^org.apache.cassandra.metrics.Compaction.+"
- "^org.apache.cassandra.metrics.DroppedMetrics.+"
- "^org.apache.cassandra.metrics.ReadRepair.+"
- "^org.apache.cassandra.metrics.Storage.+"
- "^org.apache.cassandra.metrics.ThreadPools.+"
- "^org.apache.cassandra.metrics.ColumnFamily.+"
- "^org.apache.cassandra.metrics.Streaming.+"


And then added this line to etc/dse/cassandra/cassandra-env.sh

​
JVM_OPTS="$JVM_OPTS 
-Dcassandra.metricsReporterConfigFile=metrics-reporter-config.yaml

And then finally restarted DSE, /etc/init.d/dse restart

I dont see any csv metrics files being spitted out by the MetricsReported in 
/mnt/cassandra/metrics folder.


any  ideas why?

!!!*
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.!!!"


RE: High read latency

2015-09-23 Thread Leleu Eric
For read  heavy workload,  JVM GC can cause latency issue. (see 
http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads)
If you have frequent minor GC taking 400ms, it may increase your read latency.

Eric

De : Jaydeep Chovatia [mailto:chovatia.jayd...@gmail.com]
Envoyé : mardi 22 septembre 2015 19:50
À : user@cassandra.apache.org
Objet : Re: High read latency

select * from test where a = ? and b = ?

On Tue, Sep 22, 2015 at 10:27 AM, sai krishnam raju potturi 
<pskraj...@gmail.com<mailto:pskraj...@gmail.com>> wrote:
thanks for the information. Posting the query too would be of help.

On Tue, Sep 22, 2015 at 11:56 AM, Jaydeep Chovatia 
<chovatia.jayd...@gmail.com<mailto:chovatia.jayd...@gmail.com>> wrote:

Please find required details here:

-  Number of req/s

2k reads/s

-  Schema details

create table test {

a timeuuid,

b bigint,

c int,

d int static,

e int static,

f int static,

g int static,

h int,

i text,

j text,

k text,

l text,

m set

n bigint

o bigint

p bigint

q bigint

r int

s text

t bigint

u text

v text

w text

x bigint

y bigint

z bigint,

primary key ((a, b), c)

};

-  JVM settings about the heap

Default settings

-  Execution time of the GC

Avg. 400ms. I do not see long pauses of GC anywhere in the log file.

On Tue, Sep 22, 2015 at 5:34 AM, Leleu Eric 
<eric.le...@worldline.com<mailto:eric.le...@worldline.com>> wrote:
Hi,


Before speaking about tuning, can you provide some additional information ?


-  Number of req/s

-  Schema details

-  JVM settings about the heap

-  Execution time of the GC

43ms for a read latency may be acceptable according to the number of request 
per second.


Eric

De : Jaydeep Chovatia 
[mailto:chovatia.jayd...@gmail.com<mailto:chovatia.jayd...@gmail.com>]
Envoyé : mardi 22 septembre 2015 00:07
À : user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Objet : High read latency

Hi,

My application issues more read requests than write, I do see that under load 
cfstats for one of the table is quite high around 43ms

Local read count: 114479357
Local read latency: 43.442 ms
Local write count: 22288868
Local write latency: 0.609 ms


Here is my node configuration:
RF=3, Read/Write with QUORUM, 64GB RAM, 48 CPU core. I have only 5 GB of data 
on each node (and for experiment purpose I stored data in tmpfs)

I've tried increasing concurrent_read count upto 512 but no help in read 
latency. CPU/Memory/IO looks fine on system.

Any idea what should I tune?

Jaydeep



Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.






Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 

RE: High read latency

2015-09-22 Thread Leleu Eric
Hi,


Before speaking about tuning, can you provide some additional information ?


-  Number of req/s

-  Schema details

-  JVM settings about the heap

-  Execution time of the GC

43ms for a read latency may be acceptable according to the number of request 
per second.


Eric

De : Jaydeep Chovatia [mailto:chovatia.jayd...@gmail.com]
Envoyé : mardi 22 septembre 2015 00:07
À : user@cassandra.apache.org
Objet : High read latency

Hi,

My application issues more read requests than write, I do see that under load 
cfstats for one of the table is quite high around 43ms

Local read count: 114479357
Local read latency: 43.442 ms
Local write count: 22288868
Local write latency: 0.609 ms


Here is my node configuration:
RF=3, Read/Write with QUORUM, 64GB RAM, 48 CPU core. I have only 5 GB of data 
on each node (and for experiment purpose I stored data in tmpfs)

I've tried increasing concurrent_read count upto 512 but no help in read 
latency. CPU/Memory/IO looks fine on system.

Any idea what should I tune?

Jaydeep



Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.


RE: Cassandra Query using UDF

2015-09-16 Thread Leleu Eric
Hi,

I'm not a Mongo user and I never used the Cassandra UDF feature but I found 
this (You may have already found it):
https://issues.apache.org/jira/browse/CASSANDRA-8488


Eric


-Message d'origine-
De : Michael Scriney [mailto:mscri...@computing.dcu.ie]
Envoyé : mercredi 16 septembre 2015 12:35
À : user@cassandra.apache.org
Objet : Cassandra Query using UDF

Hello

I am wondering is it possible to execute a search using a Cassandra UDF.
Similarly to the way I can execute find queries in mongo using custom 
javascript.

Thanks


Michael.



Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.


RE: CPU consumption of Cassandra

2014-09-24 Thread Leleu Eric
Thanks for all your comments, you help me a lot.

My composite partition key is clearly the bottleneck (PRIMARY KEY ((owner, 
tenantid), name)) )
When I ran Cassandra-stress (on a dedicated server), the number of reads can’t 
go further than 10K/s (with idle to 20% / user to 72% and sys to 8%)

If I create another table with a non-composed partition key, the number of 
reads reach 77K/s (with idle to 5% / user to 88% and sys to 6%)

I guess I don’t have enough data to use a composed partition key.
I will concatenate the two fields value in my application and it will probably 
increase my throughput.

Regards,
Eric

De : DuyHai Doan [mailto:doanduy...@gmail.com]
Envoyé : mercredi 24 septembre 2014 00:10
À : user@cassandra.apache.org
Objet : Re: CPU consumption of Cassandra

Nice catch Daniel. The comment from Sylvain explains a lot !

On Tue, Sep 23, 2014 at 11:33 PM, Daniel Chia 
danc...@coursera.orgmailto:danc...@coursera.org wrote:
If I had to guess, it might be in part i could be due to inefficiencies in 2.0 
with regards to CompositeType (which is used in CQL3 tables) - 
https://issues.apache.org/jira/browse/CASSANDRA-5417?focusedCommentId=13821243page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13821243

Ticket reports 45% performance increase in reading slices compared to trunk in 
2.1

Thanks,
Daniel

On Tue, Sep 23, 2014 at 5:08 PM, DuyHai Doan 
doanduy...@gmail.commailto:doanduy...@gmail.com wrote:
I had done some benching in the past when we faced high CPU usage even though 
data set is very small, sitting entirely in memory, read the report there: 
https://github.com/doanduyhai/Cassandra_Data_Model_Bench
 Our partial conclusion were:
 1) slice query fetching a page of 64kb of data and decoding columns is more 
CPU-expensive than a single read by column
 2) the decoding of CompositeType costs more CPU for CQL3 data model than for 
old Thrift column family
 3) since the Cell type for all CQL3 table is forced to BytesType to support 
any type of data, serialization/de-serialization may have a cost on CPU.
The issue Eric Leleu is facing reminds me of point 1). When he puts limit to 1, 
it's a single read by column. The other query with limit 10 is  translated 
internally to a slice query and may explain the CPU difference
 Now, do not take my words as granted. Those points are just assumptions and 
partial conclusions. I need extensive in depth debugging to confirm those. Did 
not have time lately.

On Tue, Sep 23, 2014 at 10:46 PM, Chris Lohfink 
clohf...@blackbirdit.commailto:clohf...@blackbirdit.com wrote:
CPU consumption may be affected from the cassandra-stress tool in 2nd example 
as well.  Running on a separate system eliminates it as a possible cause.  
There is a little extra work but not anything that I think would be that 
obvious.  tracing (can enable with nodetool) or profiling (ie with yourkit) can 
give more exposure to the bottleneck.  Id run test from separate system first.

---
Chris Lohfink


On Sep 23, 2014, at 12:48 PM, Leleu Eric 
eric.le...@worldline.commailto:eric.le...@worldline.com wrote:


First of all, Thanks for your help ! :)

Here is some details :


With RF=N=2 your essentially testing a single machine locally which isnt the 
best indicator long term
I will  test with more nodes, (4 with RF = 2) but for now I'm limited to 2 
nodes for non technical reason ...


Well, first off you shouldn't run stress tool on the node your testing.  Give 
it its own box.
I performed the test in a new Keyspace in order to have a clear dataset.


the 2nd query since its returning 10x the data and there will be more to go 
through within the partition
I configured cassandra-stress in a way of each user has only one bucket so the 
amount of data is the same in the both case. (select * from buckets where name 
= ? and tenantid = ? limit 1 and select * from owner_to_buckets  where owner 
= ? and tenantid = ? limit 10).
Does cassandra perform extra read when the limit is bigger than the available 
data (even if the partition key contains only one single value in the 
clustering column) ?
If the amount of data is the same, how can we explain the difference of CPU 
consumption?


Regards,
Eric


De : Chris Lohfink [clohf...@blackbirdit.commailto:clohf...@blackbirdit.com]
Date d'envoi : mardi 23 septembre 2014 19:23
À : user@cassandra.apache.orgmailto:user@cassandra.apache.org
Objet : Re: CPU consumption of Cassandra

Well, first off you shouldn't run stress tool on the node your testing.  Give 
it its own box.

With RF=N=2 your essentially testing a single machine locally which isnt the 
best indicator long term (optimizations available when reading data thats local 
to the node).  80k/sec on a system is pretty good though, your probably seeing 
slower on the 2nd query since its returning 10x the data and there will be more 
to go through within the partition. 42k/sec is still acceptable imho since 
these are smaller boxes

RE: CPU consumption of Cassandra

2014-09-23 Thread Leleu Eric
I tried to run cassandra-stress on some of my table as proposed by Jake 
Luciani.

For a simple table, this tool is able to perform 8 read op/s with a few CPU 
consumption if I request the table by the PK(name, tenanted)

Ex :
TABLE :

CREATE TABLE IF NOT EXISTS buckets (tenantid varchar,
name varchar,
owner varchar,
location varchar,
description varchar,
codeQuota varchar,
creationDate timestamp,
updateDate timestamp,
PRIMARY KEY (name, tenantid));

QUERY : select * from buckets where name = ? and tenantid = ? limit 1;

TOP output for 900 threads on cassandra-stress :
top - 13:17:09 up 173 days, 21:54,  4 users,  load average: 11.88, 4.30, 2.76
Tasks: 272 total,   1 running, 270 sleeping,   0 stopped,   1 zombie
Cpu(s): 71.4%us, 14.0%sy,  0.0%ni, 13.1%id,  0.0%wa,  0.0%hi,  1.5%si,  0.0%st
Mem:  98894704k total, 96367436k used,  2527268k free,15440k buffers
Swap:0k total,0k used,0k free, 88194556k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
25857 root  20   0 29.7g 1.5g  12m S 693.0  1.6  38:45.58 java  == 
Cassandra-stress
29160 cassandr  20   0 16.3g 4.8g  10m S  1.3  5.0  44:46.89 java  == Cassandra



Now, If I run another query on a table that provides a list of buckets 
according to the  owner, the number of op/s is divided by 2  (42000 op/s) and 
CPU consumption grow UP.

Ex :
TABLE :

CREATE TABLE IF NOT EXISTS owner_to_buckets (tenantid varchar,
name varchar,
owner varchar,
location varchar,
description varchar,
codeQuota varchar,
creationDate timestamp,
updateDate timestamp,
PRIMARY KEY ((owner, tenantid), name));

QUERY : select * from owner_to_buckets  where owner = ? and tenantid = ? limit 
10;

TOP output for 4  threads on cassandra-stress:

top - 13:49:16 up 173 days, 22:26,  4 users,  load average: 1.76, 1.48, 1.17
Tasks: 273 total,   1 running, 271 sleeping,   0 stopped,   1 zombie
Cpu(s): 26.3%us,  8.0%sy,  0.0%ni, 64.7%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Mem:  98894704k total, 97512156k used,  1382548k free,14580k buffers
Swap:0k total,0k used,0k free, 90413772k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29160 cassandr  20   0 13.6g 4.8g  37m S 186.7  5.1  62:26.77 java == Cassandra
50622 root  20   0 28.8g 469m  12m S 102.5  0.5   0:45.84 java == 
Cassandra-stress

TOP output for 271  threads on cassandra-stress:


top - 13:57:03 up 173 days, 22:34,  4 users,  load average: 4.67, 1.76, 1.25
Tasks: 272 total,   1 running, 270 sleeping,   0 stopped,   1 zombie
Cpu(s): 81.5%us, 14.0%sy,  0.0%ni,  3.1%id,  0.0%wa,  0.0%hi,  1.3%si,  0.0%st
Mem:  98894704k total, 94955936k used,  3938768k free,15892k buffers
Swap:0k total,0k used,0k free, 85993676k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29160 cassandr  20   0 13.6g 4.8g  38m S 430.0  5.1  82:31.80 java == Cassandra
50622 root  20   0 29.1g 2.3g  12m S 343.4  2.4  17:51.22 java == 
Cassandra-stress


I have 4 tables with  a composed PRIMARY KEY (two of them has 4 entries : 2 for 
the partition key, one for cluster column and one for sort column)
Two of these tables are frequently read with the partition key because we want 
to list data of a given user, this should explain my CPU load according to the 
simple test done with Cassandra-stress ...

How can I avoid this?
Collections could be an option but the number of data per user is not limited 
and can easily exceed 200 entries. According to the Cassandra documentation, 
collections have a size limited to 64KB. So it is probably not a solution in my 
case. :(


Regards,
Eric

De : Chris Lohfink [mailto:clohf...@blackbirdit.com]
Envoyé : lundi 22 septembre 2014 22:03
À : user@cassandra.apache.org
Objet : Re: CPU consumption of Cassandra

Its going to depend a lot on your data model but 5-6k is on the low end of what 
I would expect.  N=RF=2 is not really something I would recommend.  That said 
93GB is not much data so the bottleneck may exist more in your data model, 
queries, or client.

What profiler are you using?  The cpu on the select/read is marked as RUNNABLE 
but its really more of a wait state that may throw some profilers off, it may 
be a red haring.

---
Chris Lohfink

On Sep 22, 2014, at 11:39 AM, Leleu Eric 
eric.le...@worldline.commailto:eric.le...@worldline.com wrote:


Hi,


I'm currently testing Cassandra 2.0.9  (and since the last week 2.1) under some 
read heavy load...

I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and 8 
Cores.
I have around 93GB of data per node (one Disk of 300GB with SAS interface and a 
Rotational Speed of 10500)

I have 300 active client threads and they request the C* nodes with a 
Consitency level set to ONE (I'm using the CQL datastax driver).

During my tests I saw  a lot of CPU consumption (70% user / 6%sys / 4% iowait / 
20%idle).
C* nodes respond to around 5000 op/s (sometime up to 6000op/s)

I try to profile

RE : CPU consumption of Cassandra

2014-09-23 Thread Leleu Eric
First of all, Thanks for your help ! :)

Here is some details :

 With RF=N=2 your essentially testing a single machine locally which isnt the 
 best indicator long term
I will  test with more nodes, (4 with RF = 2) but for now I'm limited to 2 
nodes for non technical reason ...

 Well, first off you shouldn't run stress tool on the node your testing.  Give 
 it its own box.
I performed the test in a new Keyspace in order to have a clear dataset.

  the 2nd query since its returning 10x the data and there will be more to go 
 through within the partition
I configured cassandra-stress in a way of each user has only one bucket so the 
amount of data is the same in the both case. (select * from buckets where name 
= ? and tenantid = ? limit 1 and select * from owner_to_buckets  where owner 
= ? and tenantid = ? limit 10).
Does cassandra perform extra read when the limit is bigger than the available 
data (even if the partition key contains only one single value in the 
clustering column) ?
If the amount of data is the same, how can we explain the difference of CPU 
consumption?


Regards,
Eric


De : Chris Lohfink [clohf...@blackbirdit.com]
Date d'envoi : mardi 23 septembre 2014 19:23
À : user@cassandra.apache.org
Objet : Re: CPU consumption of Cassandra

Well, first off you shouldn't run stress tool on the node your testing.  Give 
it its own box.

With RF=N=2 your essentially testing a single machine locally which isnt the 
best indicator long term (optimizations available when reading data thats local 
to the node).  80k/sec on a system is pretty good though, your probably seeing 
slower on the 2nd query since its returning 10x the data and there will be more 
to go through within the partition. 42k/sec is still acceptable imho since 
these are smaller boxes.  You are probably seeing high CPU because the system 
is doing a lot :)

If you want to get more out of these systems can do some tuning probably, 
enable trace to see whats actually the bottleneck.

Collections will very likely hurt more then help.

---
Chris Lohfink

On Sep 23, 2014, at 9:39 AM, Leleu Eric 
eric.le...@worldline.commailto:eric.le...@worldline.com wrote:

I tried to run “cassandra-stress” on some of my table as proposed by Jake 
Luciani.

For a simple table, this tool is able to perform 8 read op/s with a few CPU 
consumption if I request the table by the PK(name, tenanted)

Ex :
TABLE :

CREATE TABLE IF NOT EXISTS buckets (tenantid varchar,
name varchar,
owner varchar,
location varchar,
description varchar,
codeQuota varchar,
creationDate timestamp,
updateDate timestamp,
PRIMARY KEY (name, tenantid));

QUERY : select * from buckets where name = ? and tenantid = ? limit 1;

TOP output for 900 threads on cassandra-stress :
top - 13:17:09 up 173 days, 21:54,  4 users,  load average: 11.88, 4.30, 2.76
Tasks: 272 total,   1 running, 270 sleeping,   0 stopped,   1 zombie
Cpu(s): 71.4%us, 14.0%sy,  0.0%ni, 13.1%id,  0.0%wa,  0.0%hi,  1.5%si,  0.0%st
Mem:  98894704k total, 96367436k used,  2527268k free,15440k buffers
Swap:0k total,0k used,0k free, 88194556k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
25857 root  20   0 29.7g 1.5g  12m S 693.0  1.6  38:45.58 java  == 
Cassandra-stress
29160 cassandr  20   0 16.3g 4.8g  10m S  1.3  5.0  44:46.89 java  == Cassandra



Now, If I run another query on a table that provides a list of buckets 
according to the  owner, the number of op/s is divided by 2  (42000 op/s) and 
CPU consumption grow UP.

Ex :
TABLE :

CREATE TABLE IF NOT EXISTS owner_to_buckets (tenantid varchar,
name varchar,
owner varchar,
location varchar,
description varchar,
codeQuota varchar,
creationDate timestamp,
updateDate timestamp,
PRIMARY KEY ((owner, tenantid), name));

QUERY : select * from owner_to_buckets  where owner = ? and tenantid = ? limit 
10;

TOP output for 4  threads on cassandra-stress:

top - 13:49:16 up 173 days, 22:26,  4 users,  load average: 1.76, 1.48, 1.17
Tasks: 273 total,   1 running, 271 sleeping,   0 stopped,   1 zombie
Cpu(s): 26.3%us,  8.0%sy,  0.0%ni, 64.7%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Mem:  98894704k total, 97512156k used,  1382548k free,14580k buffers
Swap:0k total,0k used,0k free, 90413772k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29160 cassandr  20   0 13.6g 4.8g  37m S 186.7  5.1  62:26.77 java == Cassandra
50622 root  20   0 28.8g 469m  12m S 102.5  0.5   0:45.84 java == 
Cassandra-stress

TOP output for 271  threads on cassandra-stress:


top - 13:57:03 up 173 days, 22:34,  4 users,  load average: 4.67, 1.76, 1.25
Tasks: 272 total,   1 running, 270 sleeping,   0 stopped,   1 zombie
Cpu(s): 81.5%us, 14.0%sy,  0.0%ni,  3.1%id,  0.0%wa,  0.0%hi,  1.3%si,  0.0%st
Mem:  98894704k total, 94955936k used,  3938768k free,15892k buffers
Swap:0k total,0k used,0k free, 85993676k cached

  PID

CPU consumption of Cassandra

2014-09-22 Thread Leleu Eric
Hi,


I'm currently testing Cassandra 2.0.9  (and since the last week 2.1) under some 
read heavy load...

I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and 8 
Cores.
I have around 93GB of data per node (one Disk of 300GB with SAS interface and a 
Rotational Speed of 10500)

I have 300 active client threads and they request the C* nodes with a 
Consitency level set to ONE (I'm using the CQL datastax driver).

During my tests I saw  a lot of CPU consumption (70% user / 6%sys / 4% iowait / 
20%idle).
C* nodes respond to around 5000 op/s (sometime up to 6000op/s)

I try to profile a node and at the first look, 60% of the CPU is passed in the 
sun.nio.ch package. (SelectorImpl.select or Channel.read)

I know that Benchmark results are highly dependent of the Dataset and use 
cases, but according to my point of view this CPU consumption is normal 
according to the load.
Someone can confirm that point ?
According to my Hardware configuration, can I expect to have more than 6000 
read op/s ?


Regards,
Eric







Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage 
exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant 
?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre 
recherch?e quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne 
saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.


Question about MemoryMeter liveRatio

2014-08-26 Thread Leleu Eric
Hi,


I'm trying to understand what is the liveRatio and if I have to care about it.
I found some reference on the web and if I understand them, the liveRatio 
represents  the Memtable size divided by the amount of data serialized on the 
disk. Is it the truth?


When I see the following log, what can I deduce about it ?

INFO [MemoryMeter:1] 2014-08-26 19:02:41,047 Memtable.java (line 481) 
CFS(Keyspace='ufapi', ColumnFamily='users') liveRatio is 8.52308554793235 
(just-counted was 8.514143642185562).  calculation took 3613ms for 272646 cells
   INFO [MemoryMeter:1] 2014-08-26 18:36:09,965 Memtable.java (line 481) 
CFS(Keyspace='system', ColumnFamily='compactions_in_progress') liveRatio is 
40.1893491
1242604 (just-counted was 16.37869822485207).  calculation took 0ms for 7 cells


According to my read, the liveRatio is set between 1 and 64. If My liveRatio is 
around 64, should I care about some things?
Does Cassandra use the liveRatio for some internal task or it is just a metric?


Regards,
Eric



Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage 
exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant 
?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre 
recherch?e quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne 
saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.


Secondary index or dedicated CF?

2014-08-22 Thread Leleu Eric
Hi,


I'm new with Cassandra and I wondering what is the best design for my case.

I have a set of buckets that contain one or thousands of contents.

Here is my Content CF :

CREATE TABLE IF NOT EXISTS contents (tenantID varchar,
 key varchar,
type varchar,
bucket varchar,
owner varchar,
workspace varchar,
 public_read boolean  PRIMARY KEY ((key, tenantID), type, workspace));


To retrieve all contents that belong to a bucket, I have created an index on 
the bucket column.

CREATE INDEX IF NOT EXISTS bucket_to_contents ON contents (bucket);

The column value bucket is concatenated with the tenantId (bucket = 
bucketname+tenantID) in order to avoid filtering on the tenantID on my 
application.

Is it the rights way to do or should I create another column family to link 
each content to the bucket ?

CREATE TABLE IF NOT EXISTS bucket_to_contents (tenantID varchar,
 key varchar,
type varchar,
bucket varchar,
owner varchar,
workspace varchar,
 public_read boolean  PRIMARY KEY ((bucket, tenantID), key));

Under the hood what is the difference of the both solutions?

According to my understanding, the result will be the same. Both will have the 
rowkey equals to the bucketname  and the tenantID.
Excepted that the secondary index can have a replication delay...

Can you help me on this point?

Regards,
Eric




Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage 
exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant 
?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre 
recherch?e quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne 
saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.


RE: Secondary index or dedicated CF?

2014-08-22 Thread Leleu Eric
Thanks you for your feedbacks.

De : Mark Reddy [mailto:mark.l.re...@gmail.com]
Envoyé : vendredi 22 août 2014 17:08
À : user@cassandra.apache.org
Objet : Re: Secondary index or dedicated CF?

Hi,

As a general rule of thumb I would steer clear of secondary indexes, this is 
also the official stand that DataStax take (see p5 of their best practices doc: 
http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf).

It is best to avoid using Cassandra's built-in secondary indexes where 
possible. Instead, it is recommended to denormalize data and manually maintain 
a dynamic table as a form of an index instead of using a secondary index. If 
and when secondary indexes are to be used, they should be created only on 
columns containing low-cardinality data (for example: fields with less than 
1000 states).

Mark

On 22 Aug 2014, at 15:58, DuyHai Doan 
doanduy...@gmail.commailto:doanduy...@gmail.com wrote:


Hello Eric

Under the hood what is the difference of the both solutions?

 1. Cassandra secondary index: distributed index, supports better high volume 
of data, the index itself is distributed so there is no bottleneck. The 
tradeoff is that depending on the cardinality of data having the same 
bucketname+tenantID the performance may drop sharply. Please read this: 
http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__when-no-index.
 There are several restrictions to secondary index

2. Manual index: easy to design, but potentially wide row and not well balance 
if  data having the same bucketname+tenantID is very large. Furthermore you 
need to manage index consistency manually so that it is synced with source data 
updates.

 The best thing to do is to benchmark both solutions and takes the approach 
giving you the best results. Be careful with benchmarks, it should be 
representative of the data pattern you likely have in production.

On Fri, Aug 22, 2014 at 7:47 AM, Leleu Eric 
eric.le...@worldline.commailto:eric.le...@worldline.com wrote:
Hi,


I'm new with Cassandra and I wondering what is the best design for my case.

I have a set of buckets that contain one or thousands of contents.

Here is my Content CF :

CREATE TABLE IF NOT EXISTS contents (tenantID varchar,
 key varchar,
type varchar,
bucket varchar,
owner varchar,
workspace varchar,
 public_read boolean  PRIMARY KEY ((key, tenantID), type, workspace));


To retrieve all contents that belong to a bucket, I have created an index on 
the bucket column.

CREATE INDEX IF NOT EXISTS bucket_to_contents ON contents (bucket);

The column value bucket is concatenated with the tenantId (bucket = 
bucketname+tenantID) in order to avoid filtering on the tenantID on my 
application.

Is it the rights way to do or should I create another column family to link 
each content to the bucket ?

CREATE TABLE IF NOT EXISTS bucket_to_contents (tenantID varchar,
 key varchar,
type varchar,
bucket varchar,
owner varchar,
workspace varchar,
 public_read boolean  PRIMARY KEY ((bucket, tenantID), key));

Under the hood what is the difference of the both solutions?

According to my understanding, the result will be the same. Both will have the 
rowkey equals to the bucketname  and the tenantID.
Excepted that the secondary index can have a replication delay...

Can you help me on this point?

Regards,
Eric




Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.





Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement