Re: Stress Test

2018-09-09 Thread Swen Moczarski
Hi,
I found this blog quite helpful:
https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/

on 1, not sure if I understand your question correctly, but I would not
start the stress test process on a Cassandra node which will be under test.
on 3, the tool has already with an option to generate nice graphs:
http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html#graphing

Hope that helps.

Am Do., 6. Sep. 2018 um 20:14 Uhr schrieb rajasekhar kommineni <
rajaco...@gmail.com>:

> Hello Folks,
>
> Does any body refer good documentation on Cassandra stress test.
>
> I have below questions.
>
> 1) Which server is good to start the test, Cassandra server or Application
> server.
> 2) I am using Datastax Java driver, is any good documentation for stress
> test specific to this driver.
> 3) How to analyze the stress test output.
>
> Thanks,
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Stress Test

2018-09-06 Thread rajasekhar kommineni
Hello Folks,

Does any body refer good documentation on Cassandra stress test. 

I have below questions.

1) Which server is good to start the test, Cassandra server or Application 
server.
2) I am using Datastax Java driver, is any good documentation for stress test 
specific to this driver.
3) How to analyze the stress test output.

Thanks,


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Stress test cassandr

2017-11-26 Thread Jonathan Haddad
Have you read through the docs for stress? You can have it use your own
queries and data model.

http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html
On Sun, Nov 26, 2017 at 1:02 AM Akshit Jain <akshit13...@iiitd.ac.in> wrote:

> Hi,
> What is the best way to stress test the cassandra cluster with real life
> workloads which is being followed currently?
> Currently i am using cassandra stress-tool but it generated blob data
> /yaml files provides the option to use custom keyspace.
>
> But what are the different parameters values which can be set to test the
> cassandra cluster in extreme environment?
>
>


Stress test cassandr

2017-11-26 Thread Akshit Jain
Hi,
What is the best way to stress test the cassandra cluster with real life
workloads which is being followed currently?
Currently i am using cassandra stress-tool but it generated blob data /yaml
files provides the option to use custom keyspace.

But what are the different parameters values which can be set to test the
cassandra cluster in extreme environment?


Re: Stress test

2017-07-27 Thread Jay Zhuang
The user and password should be in -mode section, for example:
./cassandra-stress user profile=table.yaml ops\(insert=1\) -mode native
cql3 user=** password=**

http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsCStress.html

/Jay

On 7/27/17 2:46 PM, Greg Lloyd wrote:
> I am trying to use the cassandra stress tool with the user
> profile=table.yaml arguments specified and do authentication at the same
> time. If I use the user profile I get an error Invalid parameter
> user=* if I specify a user and password.
> 
> Is it not possible to specify a yaml and use authentication?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Stress test

2017-07-27 Thread Greg Lloyd
I am trying to use the cassandra stress tool with the user
profile=table.yaml arguments specified and do authentication at the same
time. If I use the user profile I get an error Invalid parameter user=* if
I specify a user and password.

Is it not possible to specify a yaml and use authentication?


Re: How to stress test collections in Cassandra Stress

2017-04-25 Thread Alain RODRIGUEZ
Hi 'luckiboy'.

You have been trying to unsubscribe from Cassandra dev and user list lately.

To do so, sending "unsubscribe" in a message is not the way to go as you
probably noticed by now. It just spam people on those lists.

As written here http://cassandra.apache.org/community/, you actually have
to send an email to both user-unsubscr...@cassandra.apache.org and
dev-unsubscr...@cassandra.apache.org.

Cheers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-04-24 15:08 GMT+02:00 LuckyBoy <luckibo...@gmail.com>:

> unsubscribe
>
> On Thu, Apr 13, 2017 at 7:26 AM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to do a stress test on a a table with a collection column, but
>> cannot figure out how to do that.
>>
>> I tried
>>
>> table_definition: |
>>   CREATE TABLE list (
>> customer_id bigint,
>> items list,
>> PRIMARY KEY (customer_id));
>>
>> columnspec:
>>   - name: customer_id
>> size: fixed(64)
>> population: norm(0..40M)
>>   - name: items
>> cluster: fixed(40)
>>
>> When running the benchmark, I get: java.io.IOException: Operation x10 on
>> key(s) [27056313]: Error executing: (NoSuchElementException)
>>
>>
>>
>


Re: How to stress test collections in Cassandra Stress

2017-04-24 Thread LuckyBoy
unsubscribe

On Thu, Apr 13, 2017 at 7:26 AM, eugene miretsky <eugene.miret...@gmail.com>
wrote:

> Hi,
>
> I'm trying to do a stress test on a a table with a collection column, but
> cannot figure out how to do that.
>
> I tried
>
> table_definition: |
>   CREATE TABLE list (
> customer_id bigint,
> items list,
> PRIMARY KEY (customer_id));
>
> columnspec:
>   - name: customer_id
> size: fixed(64)
> population: norm(0..40M)
>   - name: items
> cluster: fixed(40)
>
> When running the benchmark, I get: java.io.IOException: Operation x10 on
> key(s) [27056313]: Error executing: (NoSuchElementException)
>
>
>


Re: How to stress test collections in Cassandra Stress

2017-04-24 Thread Ahmed Eljami
Hi,

Collections are not supported in cassandra-stress tool.


I suggest you use Jmeter with cassandra java driver to do your stress test
with collection or Spark.


2017-04-13 16:26 GMT+02:00 eugene miretsky <eugene.miret...@gmail.com>:

> Hi,
>
> I'm trying to do a stress test on a a table with a collection column, but
> cannot figure out how to do that.
>
> I tried
>
> table_definition: |
>   CREATE TABLE list (
> customer_id bigint,
> items list,
> PRIMARY KEY (customer_id));
>
> columnspec:
>   - name: customer_id
> size: fixed(64)
> population: norm(0..40M)
>   - name: items
> cluster: fixed(40)
>
> When running the benchmark, I get: java.io.IOException: Operation x10 on
> key(s) [27056313]: Error executing: (NoSuchElementException)
>
>
>


-- 
Cordialement;

Ahmed ELJAMI


How to stress test collections in Cassandra Stress

2017-04-13 Thread eugene miretsky
Hi,

I'm trying to do a stress test on a a table with a collection column, but
cannot figure out how to do that.

I tried

table_definition: |
  CREATE TABLE list (
customer_id bigint,
items list,
PRIMARY KEY (customer_id));

columnspec:
  - name: customer_id
size: fixed(64)
population: norm(0..40M)
  - name: items
cluster: fixed(40)

When running the benchmark, I get: java.io.IOException: Operation x10 on
key(s) [27056313]: Error executing: (NoSuchElementException)


Fwd: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Nisha Menon
I have been using the cassandra-stress tool to evaluate my cassandra
cluster for quite some time now. My problem is that I am not able to
comprehend the results generated for my specific use case.

My schema looks something like this:

CREATE TABLE Table_test(
  ID uuid,
  Time timestamp,
  Value double,
  Date timestamp,
  PRIMARY KEY ((ID,Date), Time)
) WITH COMPACT STORAGE;

I have parsed this information in a custom yaml file and used parameters
n=1, threads=100 and the rest are default options (cl=one, mode=native
cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

A few specifics of the custom yaml file are as follows:

insert:
partitions: fixed(100)
select: fixed(1)/2
batchtype: UNLOGGED

columnspecs:
-name: Time
 size: fixed(1000)
-name: ID
 size: uniform(1..100)
-name: Date
 size: uniform(1..10)
-name: Value
 size: uniform(-100..100)

My observations so far are as follows (Please correct me if I am wrong):

   1. With n=1 and time: fixed(1000), the number of rows getting
   inserted is 10 million. (1*1000=1000)
   2. The number of row-keys/partitions is 1(i.e n), within which 100
   partitions are taken at a time (which means 100 *1000 = 10 key-value
   pairs) out of which 5 key-value pairs are processed at a time. (This is
   because of select: fixed(1)/2 ~ 50%)

The output message also confirms the same:

Generating batches with [100..100] partitions and [5..5] rows
(of[10..10] total rows in the partitions)

The results that I get are the following for consecutive runs with the same
configuration as above:

Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
1 56   19 1885   943246 3.0
2 46   46 4648  2325498 1.0
3 27   30 2982  1489870 0.9
4 59   19 1932   966034 3.1
5 100  17 1730   865182 5.8

Now what I need to understand are as follows:

   1. Which among these metrics is the throughput i.e, No. of records
   inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s
   the Row_rate, can I safely conclude here that I am able to insert close to
   1 million records per second? Any thoughts on what the Op_rate and
   Partition_rate mean in this case?
   2. Why is it that the Total_ops vary so drastically in every run ? Has
   the number of threads got anything to do with this variation? What can I
   conclude here about the stability of my Cassandra setup?
   3. How do I determine the batch size per thread here? In my example, is
   the batch size 5?

Thanks in advance.



-- 
Nisha Menon
BTech (CS) Sahrdaya CET,
MTech (CS) IIIT Banglore.


Re: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Jake Luciani
Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time.  Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/10

On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon nisha.meno...@gmail.com wrote:
 I have been using the cassandra-stress tool to evaluate my cassandra cluster
 for quite some time now. My problem is that I am not able to comprehend the
 results generated for my specific use case.

 My schema looks something like this:

 CREATE TABLE Table_test(
   ID uuid,
   Time timestamp,
   Value double,
   Date timestamp,
   PRIMARY KEY ((ID,Date), Time)
 ) WITH COMPACT STORAGE;

 I have parsed this information in a custom yaml file and used parameters
 n=1, threads=100 and the rest are default options (cl=one, mode=native
 cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

 A few specifics of the custom yaml file are as follows:

 insert:
 partitions: fixed(100)
 select: fixed(1)/2
 batchtype: UNLOGGED

 columnspecs:
 -name: Time
  size: fixed(1000)
 -name: ID
  size: uniform(1..100)
 -name: Date
  size: uniform(1..10)
 -name: Value
  size: uniform(-100..100)

 My observations so far are as follows (Please correct me if I am wrong):

 With n=1 and time: fixed(1000), the number of rows getting inserted is
 10 million. (1*1000=1000)
 The number of row-keys/partitions is 1(i.e n), within which 100
 partitions are taken at a time (which means 100 *1000 = 10 key-value
 pairs) out of which 5 key-value pairs are processed at a time. (This is
 because of select: fixed(1)/2 ~ 50%)

 The output message also confirms the same:

 Generating batches with [100..100] partitions and [5..5] rows
 (of[10..10] total rows in the partitions)

 The results that I get are the following for consecutive runs with the same
 configuration as above:

 Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
 1 56   19 1885   943246 3.0
 2 46   46 4648  2325498 1.0
 3 27   30 2982  1489870 0.9
 4 59   19 1932   966034 3.1
 5 100  17 1730   865182 5.8

 Now what I need to understand are as follows:

 Which among these metrics is the throughput i.e, No. of records inserted per
 second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
 can I safely conclude here that I am able to insert close to 1 million
 records per second? Any thoughts on what the Op_rate and Partition_rate mean
 in this case?
 Why is it that the Total_ops vary so drastically in every run ? Has the
 number of threads got anything to do with this variation? What can I
 conclude here about the stability of my Cassandra setup?
 How do I determine the batch size per thread here? In my example, is the
 batch size 5?

 Thanks in advance.



-- 
http://twitter.com/tjake


Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread 孔嘉林
Thanks Chris.
I run a *client on a separate* AWS *instance from* the Cassandra cluster
servers. At the client side, I create 40 or 50 threads for sending requests
to each Cassandra node. I create one thrift client for each of the threads.
And at the beginning, all the created thrift clients connect to the
corresponding Cassandra nodes and keep connecting during the whole
process(I did not close all the transports until the end of the test
process). So I use very simple load balancing, since the same number of
thrift clients connect to each node. And my source code is here:
https://github.com/kongjialin/Cassandra/blob/master/cassandra_client.cpp It's
very nice of you to help me improve my code.

As I increase the number of threads, the latency gets longer.

I'm using C++, so if I want to use native binary + prepared statements, the
only way is to use C++ driver?
Thanks very much.




2014-12-08 12:51 GMT+08:00 Chris Lohfink clohfin...@gmail.com:

 I think your client could use improvements.  How many threads do you have
 running in your test?  With a thrift call like that you only can do one
 request at a time per connection.   For example, assuming C* takes 0ms, a
 10ms network latency/driver overhead will mean 20ms RTT and a max
 throughput of ~50 QPS per thread (native binary doesn't behave like this).
 Are you running client on its own system or shared with a node?  how are
 you load balancing your requests?  Source code would help since theres a
 lot that can become a bottleneck.

 Generally you will see a bit of a dip in latency from N=RF=1 and N=2, RF=2
 etc since there are optimizations on the coordinator node when it doesn't
 need to send the request to the replicas.  The impact of the network
 overhead decreases in significance as cluster grows.  Typically; latency
 wise, RF=N=1 is going to be fastest possible for smaller loads (ie when a
 client cannot fully saturate a single node).

 Main thing to expect is that latency will plateau and remain fairly
 constant as load/nodes increase while throughput potential will linearly
 (empirically at least) increase.

 You should really attempt it with the native binary + prepared statements,
 running cql over thrift is far from optimal.  I would recommend using the
 cassandra-stress tool if you want to stress test Cassandra (and not your
 code)
 http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

 ===
 Chris Lohfink

 On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林 kongjiali...@gmail.com wrote:

 Hi Eric,
 Thank you very much for your reply!
 Do you mean that I should clear my table after each run? Indeed, I can
 see several times of compaction during my test, but could only a few times
 compaction affect the performance that much? Also, I can see from the
 OpsCenter some ParNew GC happen but no CMS GC happen.

 I run my test on EC2 cluster, I think the network could be of high speed
 with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
 storage, which is of m3.xlarge type.

 As for latency, which latency should I care about most? p(99) or p(999)?
 I want to get the max QPS under a certain limited latency.

 I know my testing scenario are not the common case in production, I just
 want to know how much burden my cluster can bear under stress.

 So, how did you test your cluster that can get 86k writes/sec? How many
 requests did you send to your cluster? Was it also 1 million? Did you also
 use OpsCenter to monitor the real time performance? I also wonder why the
 write and read QPS OpsCenter provide are much lower than what I calculate.
 Could you please describe in detail about your test deployment?

 Thank you very much,
 Joy

 2014-12-07 23:55 GMT+08:00 Eric Stevens migh...@gmail.com:

 Hi Joy,

 Are you resetting your data after each test run?  I wonder if your tests
 are actually causing you to fall behind on data grooming tasks such as
 compaction, and so performance suffers for your later tests.

 There are *so many* factors which can affect performance, without
 reviewing test methodology in great detail, it's really hard to say whether
 there are flaws which might uncover an antipattern cause atypical number of
 cache hits or misses, and so forth. You may also be producing gc pressure
 in the write path, and so forth.

 I *can* say that 28k writes per second looks just a little low, but it
 depends a lot on your network, hardware, and write patterns (eg, data
 size).  For a little performance test suite I wrote, with parallel batched
 writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
 second.

 Also focusing exclusively on max latency is going to cause you some
 troubles especially in the case of magnetic media as you're using.  Between
 ill-timed GC and inconsistent performance characteristics from magnetic
 media, your max numbers will often look significantly worse than your p(99)
 or p(999) numbers.

 All this said, one node will often look better than several nodes

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread Chris Lohfink
So I would -expect- an increase of ~20k qps per node with m3.xlarge so
there may be something up with your client (I am not a c++ person however
but hopefully someone on list will take notice).

Latency does not decreases linearly as you add nodes.  What you are likely
seeing with latency since so few nodes is side effect of an optimization.
When you read/write from a table the node you request will act as the
coordinator.  If the data exists on the coordinator and using rf=1 or cl=1
it will not have to send the request to another node, just service it
locally:

  +-+ +--+
  |  node0  | +--|node1 |
  |-| |--|
  |  client | --+| coordinator  |
  +-+ +--+

In this case the write latency is dominated by the network between
coordinator and client.  A second case is where the coordinator actually
has to send the request to another node:

  +-+ +--+ +---+
  |  node0  | +--|node1 |+-- |node2  |
  |-| |--| |---|
  |  client | --+| coordinator  |---+| data replica  |
  +-+ +--+ +---+

As your adding nodes your increasing the probability of hitting this second
scenario where the coordinator has to make an additional network hop.  This
possibly why your seeing an increase (aside from client issues). To get an
idea on how the latency is affected when you increase nodes you really need
to go higher then 4 (ie graph the same rf for 5, 10, 15, 25 nodes.  below 5
isn't really the recommended way to run Cassandra anyway) nodes since the
latency will approach that of the 2nd scenario (plus some spike outliers
for GCs) and then it should settle down until you overwork the node.

May want to give https://github.com/datastax/cpp-driver a go (not cpp guy
take with grain of salt).  I would still highly recommend using
cassandra-stress instead of own stuff if you want to test cassandra and not
your code.

===
Chris Lohfink

On Mon, Dec 8, 2014 at 4:57 AM, 孔嘉林 kongjiali...@gmail.com wrote:

 Thanks Chris.
 I run a *client on a separate* AWS *instance from* the Cassandra cluster
 servers. At the client side, I create 40 or 50 threads for sending requests
 to each Cassandra node. I create one thrift client for each of the threads.
 And at the beginning, all the created thrift clients connect to the
 corresponding Cassandra nodes and keep connecting during the whole
 process(I did not close all the transports until the end of the test
 process). So I use very simple load balancing, since the same number of
 thrift clients connect to each node. And my source code is here:
 https://github.com/kongjialin/Cassandra/blob/master/cassandra_client.cpp It's
 very nice of you to help me improve my code.

 As I increase the number of threads, the latency gets longer.

 I'm using C++, so if I want to use native binary + prepared statements,
 the only way is to use C++ driver?
 Thanks very much.




 2014-12-08 12:51 GMT+08:00 Chris Lohfink clohfin...@gmail.com:

 I think your client could use improvements.  How many threads do you have
 running in your test?  With a thrift call like that you only can do one
 request at a time per connection.   For example, assuming C* takes 0ms, a
 10ms network latency/driver overhead will mean 20ms RTT and a max
 throughput of ~50 QPS per thread (native binary doesn't behave like this).
 Are you running client on its own system or shared with a node?  how are
 you load balancing your requests?  Source code would help since theres a
 lot that can become a bottleneck.

 Generally you will see a bit of a dip in latency from N=RF=1 and N=2,
 RF=2 etc since there are optimizations on the coordinator node when it
 doesn't need to send the request to the replicas.  The impact of the
 network overhead decreases in significance as cluster grows.  Typically;
 latency wise, RF=N=1 is going to be fastest possible for smaller loads (ie
 when a client cannot fully saturate a single node).

 Main thing to expect is that latency will plateau and remain fairly
 constant as load/nodes increase while throughput potential will linearly
 (empirically at least) increase.

 You should really attempt it with the native binary + prepared
 statements, running cql over thrift is far from optimal.  I would recommend
 using the cassandra-stress tool if you want to stress test Cassandra (and
 not your code)
 http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

 ===
 Chris Lohfink

 On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林 kongjiali...@gmail.com wrote:

 Hi Eric,
 Thank you very much for your reply!
 Do you mean that I should clear my table after each run? Indeed, I can
 see several times of compaction during my test, but could only a few times
 compaction

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread Eric Stevens
 to the replicas.  The impact of the
 network overhead decreases in significance as cluster grows.  Typically;
 latency wise, RF=N=1 is going to be fastest possible for smaller loads (ie
 when a client cannot fully saturate a single node).

 Main thing to expect is that latency will plateau and remain fairly
 constant as load/nodes increase while throughput potential will linearly
 (empirically at least) increase.

 You should really attempt it with the native binary + prepared
 statements, running cql over thrift is far from optimal.  I would recommend
 using the cassandra-stress tool if you want to stress test Cassandra (and
 not your code) http://www.datastax.com/dev/blog/improved-cassandra-2-
 1-stress-tool-benchmark-any-schema

 ===
 Chris Lohfink

 On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林 kongjiali...@gmail.com wrote:

 Hi Eric,
 Thank you very much for your reply!
 Do you mean that I should clear my table after each run? Indeed, I can
 see several times of compaction during my test, but could only a few times
 compaction affect the performance that much? Also, I can see from the
 OpsCenter some ParNew GC happen but no CMS GC happen.

 I run my test on EC2 cluster, I think the network could be of high speed
 with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
 storage, which is of m3.xlarge type.

 As for latency, which latency should I care about most? p(99) or p(999)?
 I want to get the max QPS under a certain limited latency.

 I know my testing scenario are not the common case in production, I just
 want to know how much burden my cluster can bear under stress.

 So, how did you test your cluster that can get 86k writes/sec? How many
 requests did you send to your cluster? Was it also 1 million? Did you also
 use OpsCenter to monitor the real time performance? I also wonder why the
 write and read QPS OpsCenter provide are much lower than what I calculate.
 Could you please describe in detail about your test deployment?

 Thank you very much,
 Joy

 2014-12-07 23:55 GMT+08:00 Eric Stevens migh...@gmail.com:

 Hi Joy,

 Are you resetting your data after each test run?  I wonder if your
 tests are actually causing you to fall behind on data grooming tasks such
 as compaction, and so performance suffers for your later tests.

 There are *so many* factors which can affect performance, without
 reviewing test methodology in great detail, it's really hard to say whether
 there are flaws which might uncover an antipattern cause atypical number of
 cache hits or misses, and so forth. You may also be producing gc pressure
 in the write path, and so forth.

 I *can* say that 28k writes per second looks just a little low, but it
 depends a lot on your network, hardware, and write patterns (eg, data
 size).  For a little performance test suite I wrote, with parallel batched
 writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
 second.

 Also focusing exclusively on max latency is going to cause you some
 troubles especially in the case of magnetic media as you're using.  Between
 ill-timed GC and inconsistent performance characteristics from magnetic
 media, your max numbers will often look significantly worse than your p(99)
 or p(999) numbers.

 All this said, one node will often look better than several nodes for
 certain patterns because it completely eliminates proxy (coordinator) write
 times.  All writes are local writes.  It's an over-simple case that doesn't
 reflect any practical production use of Cassandra, so it's probably not
 worth even including in your tests.  I would recommend start at 3 nodes
 rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
 compaction and aren't seeing garbage collections in the logs (either of
 those will be polluting your results with variability you can't account for
 with small sample sizes of ~1 million).

 If you expect to sustain write volumes like this, you'll find these
 clusters are sized too small (on that hardware you won't keep up with
 compaction), and your tests are again testing scenarios you wouldn't
 actually see in production.

 On Sat Dec 06 2014 at 7:09:18 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I am doing stress test on Datastax Cassandra Community 2.1.2, not
 using the provided stress test tool, but use my own stress-test client 
 code
 instead(I write some C++ stress test code). My Cassandra cluster is
 deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
 instances ) in the Datastax document, and I am not using EBS, just using
 the ephemeral storage by default. The EC2 type of Cassandra servers are
 m3.xlarge. I use another EC2 instance for my stress test client, which is
 of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
 node are in us-east. I test the Cassandra cluster which is made up of 1
 node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
 test separately, but the performance doesn’t get linear increment when new

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
Hi Joy,

Are you resetting your data after each test run?  I wonder if your tests
are actually causing you to fall behind on data grooming tasks such as
compaction, and so performance suffers for your later tests.

There are *so many* factors which can affect performance, without reviewing
test methodology in great detail, it's really hard to say whether there are
flaws which might uncover an antipattern cause atypical number of cache
hits or misses, and so forth. You may also be producing gc pressure in the
write path, and so forth.

I *can* say that 28k writes per second looks just a little low, but it
depends a lot on your network, hardware, and write patterns (eg, data
size).  For a little performance test suite I wrote, with parallel batched
writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
second.

Also focusing exclusively on max latency is going to cause you some
troubles especially in the case of magnetic media as you're using.  Between
ill-timed GC and inconsistent performance characteristics from magnetic
media, your max numbers will often look significantly worse than your p(99)
or p(999) numbers.

All this said, one node will often look better than several nodes for
certain patterns because it completely eliminates proxy (coordinator) write
times.  All writes are local writes.  It's an over-simple case that doesn't
reflect any practical production use of Cassandra, so it's probably not
worth even including in your tests.  I would recommend start at 3 nodes
rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
compaction and aren't seeing garbage collections in the logs (either of
those will be polluting your results with variability you can't account for
with small sample sizes of ~1 million).

If you expect to sustain write volumes like this, you'll find these
clusters are sized too small (on that hardware you won't keep up with
compaction), and your tests are again testing scenarios you wouldn't
actually see in production.

On Sat Dec 06 2014 at 7:09:18 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I am doing stress test on Datastax Cassandra Community 2.1.2, not using
 the provided stress test tool, but use my own stress-test client code
 instead(I write some C++ stress test code). My Cassandra cluster is
 deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
 instances ) in the Datastax document, and I am not using EBS, just using
 the ephemeral storage by default. The EC2 type of Cassandra servers are
 m3.xlarge. I use another EC2 instance for my stress test client, which is
 of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
 node are in us-east. I test the Cassandra cluster which is made up of 1
 node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
 test separately, but the performance doesn’t get linear increment when new
 nodes are added. Also I get some weird results. My test results are as
 follows(*I do 1 million operations and I try to get the best QPS when the
 max latency is no more than 200ms, and the latencies are measured from the
 client side. The QPS is calculated by total_operations/total_time).*



 *INSERT(write):*

 Node count

 Replication factor

   QPS

 Average latency(ms)

 Min latency(ms)

 .95 latency(ms)

 .99 latency(ms)

 .999 latency(ms)

 Max latency(ms)

 1

 1

 18687

 2.08

 1.48

 2.95

 5.74

 52.8

 205.4

 2

 1

 20793

 3.15

 0.84

 7.71

 41.35

 88.7

 232.7

 2

 2

 22498

 3.37

 0.86

 6.04

 36.1

 221.5

 649.3

 4

 1

 28348

 4.38

 0.85

 8.19

 64.51

 169.4

 251.9

 4

 3

 28631

 5.22

 0.87

 18.68

 68.35

 167.2

 288



 *SELECT(read):*

 Node count

 Replication factor

 QPS

 Average latency(ms)

 Min latency(ms)

 .95 latency(ms)

 .99 latency(ms)

 .999 latency(ms)

 Max latency(ms)

 1

 1

 24498

 4.01

 1.51

 7.6

 12.51

 31.5

 129.6

 2

 1

 28219

 3.38

 0.85

 9.5

 17.71

 39.2

 152.2

 2

 2

 35383

 4.06

 0.87

 9.71

 21.25

 70.3

 215.9

 4

 1

 34648

 2.78

 0.86

 6.07

 14.94

 30.8

 134.6

 4

 3

 52932

 3.45

 0.86

 10.81

 21.05

 37.4

 189.1



 The test data I use is generated randomly, and the schema I use is like (I
 use the cqlsh to create the columnfamily/table):

 CREATE TABLE table(

 id1  varchar,

 ts   varchar,

 id2  varchar,

 msg  varchar,

 PRIMARY KEY(id1, ts, id2));

 So the fields are all string and I generate each character of the string
 randomly, using srand(time(0)) and rand() in C++, so I think my test data
 could be uniformly distributed into the Cassandra cluster. And, in my
 client stress test code, I use thrift C++ interface, and the basic
 operation I do is like:

 thrift_client.execute_cql3_query(“INSERT INTO table WHERE id1=xxx, ts=xxx,
 id2=xxx, msg=xxx”); and thrift_client.execute_cql3_query(“SELECT FROM table
 WHERE id1=xxx”);

 Each data entry I INSERT of SELECT is of around 100 characters.

 On my stress test client, I create several threads to send

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
I'm sorry, I meant to say 6 nodes rf=3.

Also look at this performance over sustained periods of times, not burst
writing.  Run your test for several hours and watch memory and especially
compaction stats.  See if you can walk in what data volume you can write
while keeping outstanding compaction tasks  5 (preferably 0 or 1) for
sustained periods.  Measuring just burst writes will definitely mask real
world conditions, and Cassandra actually absorbs bursted writes really well
(which in turn masks performance problems since by the time your write
times suffer from overwhelming a cluster, you're probably already in insane
and difficult to recover crisis mode).

On Sun Dec 07 2014 at 8:55:47 AM Eric Stevens migh...@gmail.com wrote:

 Hi Joy,

 Are you resetting your data after each test run?  I wonder if your tests
 are actually causing you to fall behind on data grooming tasks such as
 compaction, and so performance suffers for your later tests.

 There are *so many* factors which can affect performance, without
 reviewing test methodology in great detail, it's really hard to say whether
 there are flaws which might uncover an antipattern cause atypical number of
 cache hits or misses, and so forth. You may also be producing gc pressure
 in the write path, and so forth.

 I *can* say that 28k writes per second looks just a little low, but it
 depends a lot on your network, hardware, and write patterns (eg, data
 size).  For a little performance test suite I wrote, with parallel batched
 writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
 second.

 Also focusing exclusively on max latency is going to cause you some
 troubles especially in the case of magnetic media as you're using.  Between
 ill-timed GC and inconsistent performance characteristics from magnetic
 media, your max numbers will often look significantly worse than your p(99)
 or p(999) numbers.

 All this said, one node will often look better than several nodes for
 certain patterns because it completely eliminates proxy (coordinator) write
 times.  All writes are local writes.  It's an over-simple case that doesn't
 reflect any practical production use of Cassandra, so it's probably not
 worth even including in your tests.  I would recommend start at 3 nodes
 rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
 compaction and aren't seeing garbage collections in the logs (either of
 those will be polluting your results with variability you can't account for
 with small sample sizes of ~1 million).

 If you expect to sustain write volumes like this, you'll find these
 clusters are sized too small (on that hardware you won't keep up with
 compaction), and your tests are again testing scenarios you wouldn't
 actually see in production.

 On Sat Dec 06 2014 at 7:09:18 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I am doing stress test on Datastax Cassandra Community 2.1.2, not using
 the provided stress test tool, but use my own stress-test client code
 instead(I write some C++ stress test code). My Cassandra cluster is
 deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
 instances ) in the Datastax document, and I am not using EBS, just using
 the ephemeral storage by default. The EC2 type of Cassandra servers are
 m3.xlarge. I use another EC2 instance for my stress test client, which is
 of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
 node are in us-east. I test the Cassandra cluster which is made up of 1
 node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
 test separately, but the performance doesn’t get linear increment when new
 nodes are added. Also I get some weird results. My test results are as
 follows(*I do 1 million operations and I try to get the best QPS when
 the max latency is no more than 200ms, and the latencies are measured from
 the client side. The QPS is calculated by total_operations/total_time).*



 *INSERT(write):*

 Node count

 Replication factor

   QPS

 Average latency(ms)

 Min latency(ms)

 .95 latency(ms)

 .99 latency(ms)

 .999 latency(ms)

 Max latency(ms)

 1

 1

 18687

 2.08

 1.48

 2.95

 5.74

 52.8

 205.4

 2

 1

 20793

 3.15

 0.84

 7.71

 41.35

 88.7

 232.7

 2

 2

 22498

 3.37

 0.86

 6.04

 36.1

 221.5

 649.3

 4

 1

 28348

 4.38

 0.85

 8.19

 64.51

 169.4

 251.9

 4

 3

 28631

 5.22

 0.87

 18.68

 68.35

 167.2

 288



 *SELECT(read):*

 Node count

 Replication factor

 QPS

 Average latency(ms)

 Min latency(ms)

 .95 latency(ms)

 .99 latency(ms)

 .999 latency(ms)

 Max latency(ms)

 1

 1

 24498

 4.01

 1.51

 7.6

 12.51

 31.5

 129.6

 2

 1

 28219

 3.38

 0.85

 9.5

 17.71

 39.2

 152.2

 2

 2

 35383

 4.06

 0.87

 9.71

 21.25

 70.3

 215.9

 4

 1

 34648

 2.78

 0.86

 6.07

 14.94

 30.8

 134.6

 4

 3

 52932

 3.45

 0.86

 10.81

 21.05

 37.4

 189.1



 The test data I use is generated randomly, and the schema I use is like
 (I use

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread 孔嘉林
Hi Eric,
Thank you very much for your reply!
Do you mean that I should clear my table after each run? Indeed, I can see
several times of compaction during my test, but could only a few times
compaction affect the performance that much? Also, I can see from the
OpsCenter some ParNew GC happen but no CMS GC happen.

I run my test on EC2 cluster, I think the network could be of high speed
with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
storage, which is of m3.xlarge type.

As for latency, which latency should I care about most? p(99) or p(999)? I
want to get the max QPS under a certain limited latency.

I know my testing scenario are not the common case in production, I just
want to know how much burden my cluster can bear under stress.

So, how did you test your cluster that can get 86k writes/sec? How many
requests did you send to your cluster? Was it also 1 million? Did you also
use OpsCenter to monitor the real time performance? I also wonder why the
write and read QPS OpsCenter provide are much lower than what I calculate.
Could you please describe in detail about your test deployment?

Thank you very much,
Joy

2014-12-07 23:55 GMT+08:00 Eric Stevens migh...@gmail.com:

 Hi Joy,

 Are you resetting your data after each test run?  I wonder if your tests
 are actually causing you to fall behind on data grooming tasks such as
 compaction, and so performance suffers for your later tests.

 There are *so many* factors which can affect performance, without
 reviewing test methodology in great detail, it's really hard to say whether
 there are flaws which might uncover an antipattern cause atypical number of
 cache hits or misses, and so forth. You may also be producing gc pressure
 in the write path, and so forth.

 I *can* say that 28k writes per second looks just a little low, but it
 depends a lot on your network, hardware, and write patterns (eg, data
 size).  For a little performance test suite I wrote, with parallel batched
 writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
 second.

 Also focusing exclusively on max latency is going to cause you some
 troubles especially in the case of magnetic media as you're using.  Between
 ill-timed GC and inconsistent performance characteristics from magnetic
 media, your max numbers will often look significantly worse than your p(99)
 or p(999) numbers.

 All this said, one node will often look better than several nodes for
 certain patterns because it completely eliminates proxy (coordinator) write
 times.  All writes are local writes.  It's an over-simple case that doesn't
 reflect any practical production use of Cassandra, so it's probably not
 worth even including in your tests.  I would recommend start at 3 nodes
 rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
 compaction and aren't seeing garbage collections in the logs (either of
 those will be polluting your results with variability you can't account for
 with small sample sizes of ~1 million).

 If you expect to sustain write volumes like this, you'll find these
 clusters are sized too small (on that hardware you won't keep up with
 compaction), and your tests are again testing scenarios you wouldn't
 actually see in production.

 On Sat Dec 06 2014 at 7:09:18 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I am doing stress test on Datastax Cassandra Community 2.1.2, not using
 the provided stress test tool, but use my own stress-test client code
 instead(I write some C++ stress test code). My Cassandra cluster is
 deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
 instances ) in the Datastax document, and I am not using EBS, just using
 the ephemeral storage by default. The EC2 type of Cassandra servers are
 m3.xlarge. I use another EC2 instance for my stress test client, which is
 of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
 node are in us-east. I test the Cassandra cluster which is made up of 1
 node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
 test separately, but the performance doesn’t get linear increment when new
 nodes are added. Also I get some weird results. My test results are as
 follows(*I do 1 million operations and I try to get the best QPS when
 the max latency is no more than 200ms, and the latencies are measured from
 the client side. The QPS is calculated by total_operations/total_time).*



 *INSERT(write):*

 Node count

 Replication factor

   QPS

 Average latency(ms)

 Min latency(ms)

 .95 latency(ms)

 .99 latency(ms)

 .999 latency(ms)

 Max latency(ms)

 1

 1

 18687

 2.08

 1.48

 2.95

 5.74

 52.8

 205.4

 2

 1

 20793

 3.15

 0.84

 7.71

 41.35

 88.7

 232.7

 2

 2

 22498

 3.37

 0.86

 6.04

 36.1

 221.5

 649.3

 4

 1

 28348

 4.38

 0.85

 8.19

 64.51

 169.4

 251.9

 4

 3

 28631

 5.22

 0.87

 18.68

 68.35

 167.2

 288



 *SELECT(read):*

 Node count

 Replication factor

 QPS

 Average latency

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Chris Lohfink
I think your client could use improvements.  How many threads do you have
running in your test?  With a thrift call like that you only can do one
request at a time per connection.   For example, assuming C* takes 0ms, a
10ms network latency/driver overhead will mean 20ms RTT and a max
throughput of ~50 QPS per thread (native binary doesn't behave like this).
Are you running client on its own system or shared with a node?  how are
you load balancing your requests?  Source code would help since theres a
lot that can become a bottleneck.

Generally you will see a bit of a dip in latency from N=RF=1 and N=2, RF=2
etc since there are optimizations on the coordinator node when it doesn't
need to send the request to the replicas.  The impact of the network
overhead decreases in significance as cluster grows.  Typically; latency
wise, RF=N=1 is going to be fastest possible for smaller loads (ie when a
client cannot fully saturate a single node).

Main thing to expect is that latency will plateau and remain fairly
constant as load/nodes increase while throughput potential will linearly
(empirically at least) increase.

You should really attempt it with the native binary + prepared statements,
running cql over thrift is far from optimal.  I would recommend using the
cassandra-stress tool if you want to stress test Cassandra (and not your
code)
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

===
Chris Lohfink

On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林 kongjiali...@gmail.com wrote:

 Hi Eric,
 Thank you very much for your reply!
 Do you mean that I should clear my table after each run? Indeed, I can see
 several times of compaction during my test, but could only a few times
 compaction affect the performance that much? Also, I can see from the
 OpsCenter some ParNew GC happen but no CMS GC happen.

 I run my test on EC2 cluster, I think the network could be of high speed
 with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
 storage, which is of m3.xlarge type.

 As for latency, which latency should I care about most? p(99) or p(999)? I
 want to get the max QPS under a certain limited latency.

 I know my testing scenario are not the common case in production, I just
 want to know how much burden my cluster can bear under stress.

 So, how did you test your cluster that can get 86k writes/sec? How many
 requests did you send to your cluster? Was it also 1 million? Did you also
 use OpsCenter to monitor the real time performance? I also wonder why the
 write and read QPS OpsCenter provide are much lower than what I calculate.
 Could you please describe in detail about your test deployment?

 Thank you very much,
 Joy

 2014-12-07 23:55 GMT+08:00 Eric Stevens migh...@gmail.com:

 Hi Joy,

 Are you resetting your data after each test run?  I wonder if your tests
 are actually causing you to fall behind on data grooming tasks such as
 compaction, and so performance suffers for your later tests.

 There are *so many* factors which can affect performance, without
 reviewing test methodology in great detail, it's really hard to say whether
 there are flaws which might uncover an antipattern cause atypical number of
 cache hits or misses, and so forth. You may also be producing gc pressure
 in the write path, and so forth.

 I *can* say that 28k writes per second looks just a little low, but it
 depends a lot on your network, hardware, and write patterns (eg, data
 size).  For a little performance test suite I wrote, with parallel batched
 writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
 second.

 Also focusing exclusively on max latency is going to cause you some
 troubles especially in the case of magnetic media as you're using.  Between
 ill-timed GC and inconsistent performance characteristics from magnetic
 media, your max numbers will often look significantly worse than your p(99)
 or p(999) numbers.

 All this said, one node will often look better than several nodes for
 certain patterns because it completely eliminates proxy (coordinator) write
 times.  All writes are local writes.  It's an over-simple case that doesn't
 reflect any practical production use of Cassandra, so it's probably not
 worth even including in your tests.  I would recommend start at 3 nodes
 rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
 compaction and aren't seeing garbage collections in the logs (either of
 those will be polluting your results with variability you can't account for
 with small sample sizes of ~1 million).

 If you expect to sustain write volumes like this, you'll find these
 clusters are sized too small (on that hardware you won't keep up with
 compaction), and your tests are again testing scenarios you wouldn't
 actually see in production.

 On Sat Dec 06 2014 at 7:09:18 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I am doing stress test on Datastax Cassandra Community 2.1.2, not using
 the provided stress test

Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-06 Thread kong
Hi,

I am doing stress test on Datastax Cassandra Community 2.1.2, not using the
provided stress test tool, but use my own stress-test client code instead(I
write some C++ stress test code). My Cassandra cluster is deployed on Amazon
EC2, using the provided Datastax Community AMI( HVM instances ) in the
Datastax document, and I am not using EBS, just using the ephemeral storage
by default. The EC2 type of Cassandra servers are m3.xlarge. I use another
EC2 instance for my stress test client, which is of type r3.8xlarge. Both
the Cassandra sever nodes and stress test client node are in us-east. I test
the Cassandra cluster which is made up of 1 node, 2 nodes, and 4 nodes
separately. I just do INSERT test and SELECT test separately, but the
performance doesn't get linear increment when new nodes are added. Also I
get some weird results. My test results are as follows(I do 1 million
operations and I try to get the best QPS when the max latency is no more
than 200ms, and the latencies are measured from the client side. The QPS is
calculated by total_operations/total_time).



INSERT(write):


Node count

Replication factor

  QPS

Average latency(ms)

Min latency(ms)

.95 latency(ms)

.99 latency(ms)

.999 latency(ms)

Max latency(ms)


1

1

18687

2.08

1.48

2.95

5.74

52.8

205.4


2

1

20793

3.15

0.84

7.71

41.35

88.7

232.7


2

2

22498

3.37

0.86

6.04

36.1

221.5

649.3


4

1

28348

4.38

0.85

8.19

64.51

169.4

251.9


4

3

28631

5.22

0.87

18.68

68.35

167.2

288

   

SELECT(read):


Node count

Replication factor

QPS

Average latency(ms)

Min latency(ms)

.95 latency(ms)

.99 latency(ms)

.999 latency(ms)

Max latency(ms)


1

1

24498

4.01

1.51

7.6

12.51

31.5

129.6


2

1

28219

3.38

0.85

9.5

17.71

39.2

152.2


2

2

35383

4.06

0.87

9.71

21.25

70.3

215.9


4

1

34648

2.78

0.86

6.07

14.94

30.8

134.6


4

3

52932

3.45

0.86

10.81

21.05

37.4

189.1

 

The test data I use is generated randomly, and the schema I use is like (I
use the cqlsh to create the columnfamily/table):

CREATE TABLE table(

id1  varchar,

ts   varchar,

id2  varchar,

msg  varchar,

PRIMARY KEY(id1, ts, id2));

So the fields are all string and I generate each character of the string
randomly, using srand(time(0)) and rand() in C++, so I think my test data
could be uniformly distributed into the Cassandra cluster. And, in my client
stress test code, I use thrift C++ interface, and the basic operation I do
is like:

thrift_client.execute_cql3_query(INSERT INTO table WHERE id1=xxx, ts=xxx,
id2=xxx, msg=xxx); and thrift_client.execute_cql3_query(SELECT FROM table
WHERE id1=xxx); 

Each data entry I INSERT of SELECT is of around 100 characters.

On my stress test client, I create several threads to send the read and
write requests, each thread having its own thrift client, and at the
beginning all the thrift clients connect to the Cassandra servers evenly.
For example, I create 160 thrift clients, and each 40 clients of them
connect to one server node, in a 4 node cluster.

 

So, 

1.   Could anyone help me explain my test results? Why does the
performance ( QPS ) just get a little increment when new nodes are added? 

2.   I learn from the materials that, Cassandra has better write
performance than read. But why in my case the read performance is better? 

3.   I also use the OpsCenter to monitor the real-time performance of my
cluster. But when I get the average QPS above, the operations/s provided by
OpsCenter is around 1+ for write peak and 5000+ for read peak.  Why is
my result inconsistent with that from OpsCenter?

4.   Are there any unreasonable things in my test method, such as test
data and QPS calculation?

 

Thank you very much,

Joy



Re: Cassandra stress test and max vs. average read/write latency.

2011-12-23 Thread Peter Fales
Peter,

Thanks for your response. I'm looking into some of the ideas in your
other recent mail, but I had another followup question on this one...

Is there any way to control the CPU load when using the stress benchmark?
I have some control over that with our home-grown benchmark, but I
thought it made sense to use the official benchmark tool as people might
more readily believe those results and/or be able to reproduce them.  But
offhand, I don't see any to throttle back the load created by the 
stress test.

On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote:
  I'm trying to understand if this is expected or not, and if there is
 
 Without careful tuning, outliers around a couple of hundred ms are
 definitely expected in general (not *necessarily*, depending on
 workload) as a result of garbage collection pauses. The impact will be
 worsened a bit if you are running under high CPU load (or even maxing
 it out with stress) because post-pause, if you are close to max CPU
 usage you will take considerably longer to catch up.
 
 Personally, I would just log each response time and feed it to gnuplot
 or something. It should be pretty obvious whether or not the latencies
 are due to periodic pauses.
 
 If you are concerned with eliminating or reducing outliers, I would:
 
 (1) Make sure that when you're benchmarking, that you're putting
 Cassandra under a reasonable amount of load. Latency benchmarks are
 usually useless if you're benchmarking against a saturated system. At
 least, start by achieving your latency goals at 25% or less CPU usage,
 and then go from there if you want to up it.
 
 (2) One can affect GC pauses, but it's non-trivial to eliminate the
 problem completely. For example, the length of frequent young-gen
 pauses can typically be decreased by decreasing the size of the young
 generation, leading to more frequent shorter GC pauses. But that
 instead causes more promotion into the old generation, which will
 result in more frequent very long pauses (relative to normal; they
 would still be infrequent relative to young gen pauses) - IF your
 workload is such that you are suffering from fragmentation and
 eventually seeing Cassandra fall back to full compacting GC:s
 (stop-the-world) for the old generation.
 
 I would start by adjusting young gen so that your frequent pauses are
 at an acceptable level, and then see whether or not you can sustain
 that in terms of old-gen.
 
 Start with this in any case: Run Cassandra with -XX:+PrintGC
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
 
 -- 
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Cassandra stress test and max vs. average read/write latency.

2011-12-22 Thread Peter Fales
Peter,

Thanks for your input.  Can you tell me more about what we should be
looking for in the gc log?   We've already got the gc logging turned
on and, and we've already done the plotting to show that in most 
cases the outliers are happening periodically (with a period of 
10s of seconds to a few minutes, depnding on load and tuning)

I've tried to correlate the times of the outliers with messages either
in the system log or the gc log.   There seemms to be some (but not
complete) correlation between the outliers and system log messages about
memtable flushing.   I can not find anything in the gc log that 
seems to be an obvious problem, or that matches up with the time 
times of the outliers.


On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote:
  I'm trying to understand if this is expected or not, and if there is
 
 Without careful tuning, outliers around a couple of hundred ms are
 definitely expected in general (not *necessarily*, depending on
 workload) as a result of garbage collection pauses. The impact will be
 worsened a bit if you are running under high CPU load (or even maxing
 it out with stress) because post-pause, if you are close to max CPU
 usage you will take considerably longer to catch up.
 
 Personally, I would just log each response time and feed it to gnuplot
 or something. It should be pretty obvious whether or not the latencies
 are due to periodic pauses.
 
 If you are concerned with eliminating or reducing outliers, I would:
 
 (1) Make sure that when you're benchmarking, that you're putting
 Cassandra under a reasonable amount of load. Latency benchmarks are
 usually useless if you're benchmarking against a saturated system. At
 least, start by achieving your latency goals at 25% or less CPU usage,
 and then go from there if you want to up it.
 
 (2) One can affect GC pauses, but it's non-trivial to eliminate the
 problem completely. For example, the length of frequent young-gen
 pauses can typically be decreased by decreasing the size of the young
 generation, leading to more frequent shorter GC pauses. But that
 instead causes more promotion into the old generation, which will
 result in more frequent very long pauses (relative to normal; they
 would still be infrequent relative to young gen pauses) - IF your
 workload is such that you are suffering from fragmentation and
 eventually seeing Cassandra fall back to full compacting GC:s
 (stop-the-world) for the old generation.
 
 I would start by adjusting young gen so that your frequent pauses are
 at an acceptable level, and then see whether or not you can sustain
 that in terms of old-gen.
 
 Start with this in any case: Run Cassandra with -XX:+PrintGC
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
 
 -- 
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Cassandra stress test and max vs. average read/write latency.

2011-12-22 Thread Peter Schuller
 Thanks for your input.  Can you tell me more about what we should be
 looking for in the gc log?   We've already got the gc logging turned
 on and, and we've already done the plotting to show that in most
 cases the outliers are happening periodically (with a period of
 10s of seconds to a few minutes, depnding on load and tuning)

Are you measuring writes or reads? If writes,
https://issues.apache.org/jira/browse/CASSANDRA-1991 is still relevant
I think (sorry no progress from my end on that one). Also, I/O
scheduling issues can easily cause problems with the commit log
latency (on fsync()). Try switching to periodic commit log mode and
see if it helps, just to eliminate that (if you're not already in
periodic; if so, try upping the interval).

For reads, I am generally unaware of much aside from GC and legitimate
jitter (scheduling/disk I/O etc) that would generate outliers. At
least that I can think of off hand...

And w.r.t. the GC log - yeah, correlating in time is one thing.
Another thing is to confirm what kind of GC pauses you're seeing.
Generally you want to be seeing lots of ParNew:s of shorter duration,
and those are tweakable by changing the young generation size. The
other thing is to make sure CMS is not failing (promotion
failure/concurrent mode failure) and falling back to a stop-the-world
serial compacting GC of the entire heap.

You might also use -:XX+PrintApplicationPauseTime (I think, I am
probably not spelling it entirely correctly) to get a more obvious and
greppable report for each pause, regardless of type/cause.

 I've tried to correlate the times of the outliers with messages either
 in the system log or the gc log.   There seemms to be some (but not
 complete) correlation between the outliers and system log messages about
 memtable flushing.   I can not find anything in the gc log that
 seems to be an obvious problem, or that matches up with the time
 times of the outliers.

And these are still the very extreme (500+ ms and such) outliers that
you're seeing w/o GC correlation? Off the top of my head, that seems
very unexpected (assuming a non-saturated system) and would definitely
invite investigation IMO.

If you're willing to start iterating with the source code I'd start
bisecting down the call stack and see where it's happening .

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Cassandra stress test and max vs. average read/write latency.

2011-12-19 Thread Peter Schuller
 I'm trying to understand if this is expected or not, and if there is

Without careful tuning, outliers around a couple of hundred ms are
definitely expected in general (not *necessarily*, depending on
workload) as a result of garbage collection pauses. The impact will be
worsened a bit if you are running under high CPU load (or even maxing
it out with stress) because post-pause, if you are close to max CPU
usage you will take considerably longer to catch up.

Personally, I would just log each response time and feed it to gnuplot
or something. It should be pretty obvious whether or not the latencies
are due to periodic pauses.

If you are concerned with eliminating or reducing outliers, I would:

(1) Make sure that when you're benchmarking, that you're putting
Cassandra under a reasonable amount of load. Latency benchmarks are
usually useless if you're benchmarking against a saturated system. At
least, start by achieving your latency goals at 25% or less CPU usage,
and then go from there if you want to up it.

(2) One can affect GC pauses, but it's non-trivial to eliminate the
problem completely. For example, the length of frequent young-gen
pauses can typically be decreased by decreasing the size of the young
generation, leading to more frequent shorter GC pauses. But that
instead causes more promotion into the old generation, which will
result in more frequent very long pauses (relative to normal; they
would still be infrequent relative to young gen pauses) - IF your
workload is such that you are suffering from fragmentation and
eventually seeing Cassandra fall back to full compacting GC:s
(stop-the-world) for the old generation.

I would start by adjusting young gen so that your frequent pauses are
at an acceptable level, and then see whether or not you can sustain
that in terms of old-gen.

Start with this in any case: Run Cassandra with -XX:+PrintGC
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Different Load values after stress test runs....

2011-08-23 Thread Chris Marino
Hi, we're running some performance tests against some clusters and I'm
curious about some of the numbers I see.

I'm running the stress test against two identically configured clusters, but
after I run at stress test, I get different Load values across the
clusters?

The difference between the two clusters is that one uses standard EC2
interfaces, but the other runs on a virtual network. Are these differences
indicating something that I should be aware of??

Here is a sample of the kinds of results I'm seeing.

Address DC  RackStatus State   LoadOwns
   Token

 12760588759xxx
10.0.0.17   DC1 RAC1Up Normal  94 MB
25.00%  0
10.0.0.18   DC1 RAC1Up Normal  104.52 MB
25.00%  42535295865xxx
10.0.0.19   DC1 RAC1Up Normal  78.58 MB
 25.00%  85070591730xxx
10.0.0.20   DC1 RAC1Up Normal  78.58 MB
 25.00%  12760588759xxx

Address DC  RackStatus State   LoadOwns
   Token

12760588759xxx
10.120.35.52DC1 RAC1Up Normal  103.74 MB
25.00%  0
10.120.6.124DC1 RAC1Up Normal  118.99 MB
25.00%  42535295865xxx
10.127.90.142   DC1 RAC1Up Normal  104.26 MB
25.00%  85070591730xxx
10.94.69.237DC1 RAC1Up Normal  75.74 MB
 25.00%  12760588759xxx

The first cluster with the vNet (10.0.0.0/28 addresses) consistently show
smaller Load values. The total Load of 355MB vs. 402MB with native EC2
interfaces?? Is a total Load value even meaningful?? The stress test is the
very first thing that's run against the clusters.

[I'm also a little puzzled that these numbers are not uniform within the
clusters, but I suspect that's because the stress test is using a key
distribution that is Gaussian.  I'm not 100% sure of this either since I've
seen conflicting documentation. Haven't tried 'random' keys, but I presume
that would change them to be uniform]

Except for these curious Load numbers, things seem to be running just fine.
Getting good fast results. Over 10 iterations I'm getting more than 10-12K
inserts per sec. (default values for the stress test).

Should I expect the Load to be the same across different clusters?? What
might explain the differences I'm seeing???

Thanks in advance.
CM


Re: Different Load values after stress test runs....

2011-08-23 Thread Philippe
Have you run repair on the nodes ? Maybe some data was lost and not repaired
yet ?

Philippe

2011/8/23 Chris Marino ch...@vcider.com

 Hi, we're running some performance tests against some clusters and I'm
 curious about some of the numbers I see.

 I'm running the stress test against two identically configured clusters,
 but after I run at stress test, I get different Load values across the
 clusters?

 The difference between the two clusters is that one uses standard EC2
 interfaces, but the other runs on a virtual network. Are these differences
 indicating something that I should be aware of??

 Here is a sample of the kinds of results I'm seeing.

 Address DC  RackStatus State   LoadOwns
Token

  12760588759xxx
 10.0.0.17   DC1 RAC1Up Normal  94 MB
 25.00%  0
 10.0.0.18   DC1 RAC1Up Normal  104.52 MB
 25.00%  42535295865xxx
 10.0.0.19   DC1 RAC1Up Normal  78.58 MB
  25.00%  85070591730xxx
 10.0.0.20   DC1 RAC1Up Normal  78.58 MB
  25.00%  12760588759xxx

 Address DC  RackStatus State   LoadOwns
Token

 12760588759xxx
 10.120.35.52DC1 RAC1Up Normal  103.74 MB
 25.00%  0
 10.120.6.124DC1 RAC1Up Normal  118.99 MB
 25.00%  42535295865xxx
 10.127.90.142   DC1 RAC1Up Normal  104.26 MB
 25.00%  85070591730xxx
 10.94.69.237DC1 RAC1Up Normal  75.74 MB
  25.00%  12760588759xxx

 The first cluster with the vNet (10.0.0.0/28 addresses) consistently show
 smaller Load values. The total Load of 355MB vs. 402MB with native EC2
 interfaces?? Is a total Load value even meaningful?? The stress test is the
 very first thing that's run against the clusters.

 [I'm also a little puzzled that these numbers are not uniform within the
 clusters, but I suspect that's because the stress test is using a key
 distribution that is Gaussian.  I'm not 100% sure of this either since I've
 seen conflicting documentation. Haven't tried 'random' keys, but I presume
 that would change them to be uniform]

 Except for these curious Load numbers, things seem to be running just fine.
 Getting good fast results. Over 10 iterations I'm getting more than 10-12K
 inserts per sec. (default values for the stress test).

 Should I expect the Load to be the same across different clusters?? What
 might explain the differences I'm seeing???

 Thanks in advance.
 CM



Re: Stress test using Java-based stress utility

2011-07-26 Thread Nilabja Banerjee
Thank you every one it is working fine.

I was watching jconsole behavior...can tell me where exactly I can
find   *RecentHitRates
:
*Tuning for Optimal Caching:
Here they have given one example of that  *
http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches
* *RecentHitRates...  *In my jconsole within MBean I am unable to find
that one.
what is the value of long[36] and long[90].  From Jconsole attributes
how can I find the  *performance of the casssandra while stress testing?
Thank You
***

On 26 July 2011 14:33, aaron morton aa...@thelastpickle.com wrote:

 It's in the source distribution under tools/stress see the instructions in
 the README file and then look at the command line help (bin/stress --help).

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26 Jul 2011, at 19:40, CASSANDRA learner wrote:

 Hi,,
 I too wanna know what this stress tool do? What is the usage of this
 tool... Please explain

 On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis jbel...@gmail.com wrote:

 What does nodetool ring say?

 On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
 nilabja.baner...@gmail.com wrote:
  Hi All,
 
  I am following this following link 
  http://www.datastax.com/docs/0.7/utilities/stress_java  for a stress
 test.
  I am getting this notification after running this command
 
  xxx.xxx.xxx.xx= my ip
 
  contrib/stress/bin/stress -d xxx.xxx.xxx.xx
 
  Created keyspaces. Sleeping 1s for propagation.
  total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
  Operation [44] retried 10 times - error inserting key 044
  ((UnavailableException))
 
  Operation [49] retried 10 times - error inserting key 049
  ((UnavailableException))
 
  Operation [7] retried 10 times - error inserting key 007
  ((UnavailableException))
 
  Operation [6] retried 10 times - error inserting key 006
  ((UnavailableException))
 
  Any idea why I am getting these things?
 
  Thank You
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com






Re: Stress test using Java-based stress utility

2011-07-26 Thread Jonathan Ellis
cassandra.db.Caches

On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee
nilabja.baner...@gmail.com wrote:
 Thank you every one it is working fine.

 I was watching jconsole behavior...can tell me where exactly I can find 
 RecentHitRates :

 Tuning for Optimal Caching:

 Here they have given one example of that
 http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches
 RecentHitRates...  In my jconsole within MBean I am unable to find that
 one.
 what is the value of long[36] and long[90].  From Jconsole attributes
 how can I find the  performance of the casssandra while stress testing?
 Thank You


 On 26 July 2011 14:33, aaron morton aa...@thelastpickle.com wrote:

 It's in the source distribution under tools/stress see the instructions in
 the README file and then look at the command line help (bin/stress --help).
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 26 Jul 2011, at 19:40, CASSANDRA learner wrote:

 Hi,,
 I too wanna know what this stress tool do? What is the usage of this
 tool... Please explain

 On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis jbel...@gmail.com wrote:

 What does nodetool ring say?

 On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
 nilabja.baner...@gmail.com wrote:
  Hi All,
 
  I am following this following link 
  http://www.datastax.com/docs/0.7/utilities/stress_java  for a stress
  test.
  I am getting this notification after running this command
 
  xxx.xxx.xxx.xx= my ip
 
  contrib/stress/bin/stress -d xxx.xxx.xxx.xx
 
  Created keyspaces. Sleeping 1s for propagation.
  total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
  Operation [44] retried 10 times - error inserting key 044
  ((UnavailableException))
 
  Operation [49] retried 10 times - error inserting key 049
  ((UnavailableException))
 
  Operation [7] retried 10 times - error inserting key 007
  ((UnavailableException))
 
  Operation [6] retried 10 times - error inserting key 006
  ((UnavailableException))
 
  Any idea why I am getting these things?
 
  Thank You
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com







-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Stress test using Java-based stress utility

2011-07-26 Thread Nilabja Banerjee
Thank you Jonathan.. :)




On 26 July 2011 20:08, Jonathan Ellis jbel...@gmail.com wrote:

 cassandra.db.Caches

 On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee
 nilabja.baner...@gmail.com wrote:
  Thank you every one it is working fine.
 
  I was watching jconsole behavior...can tell me where exactly I can find 
  RecentHitRates :
 
  Tuning for Optimal Caching:
 
  Here they have given one example of that
 
 http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches
  RecentHitRates...  In my jconsole within MBean I am unable to find
 that
  one.
  what is the value of long[36] and long[90].  From Jconsole attributes
  how can I find the  performance of the casssandra while stress testing?
  Thank You
 
 
  On 26 July 2011 14:33, aaron morton aa...@thelastpickle.com wrote:
 
  It's in the source distribution under tools/stress see the instructions
 in
  the README file and then look at the command line help (bin/stress
 --help).
  Cheers
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  On 26 Jul 2011, at 19:40, CASSANDRA learner wrote:
 
  Hi,,
  I too wanna know what this stress tool do? What is the usage of this
  tool... Please explain
 
  On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  What does nodetool ring say?
 
  On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
  nilabja.baner...@gmail.com wrote:
   Hi All,
  
   I am following this following link 
   http://www.datastax.com/docs/0.7/utilities/stress_java  for a
 stress
   test.
   I am getting this notification after running this command
  
   xxx.xxx.xxx.xx= my ip
  
   contrib/stress/bin/stress -d xxx.xxx.xxx.xx
  
   Created keyspaces. Sleeping 1s for propagation.
   total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
   Operation [44] retried 10 times - error inserting key 044
   ((UnavailableException))
  
   Operation [49] retried 10 times - error inserting key 049
   ((UnavailableException))
  
   Operation [7] retried 10 times - error inserting key 007
   ((UnavailableException))
  
   Operation [6] retried 10 times - error inserting key 006
   ((UnavailableException))
  
   Any idea why I am getting these things?
  
   Thank You
  
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Stress test using Java-based stress utility

2011-07-22 Thread Kirk True

  
  
Have you checked the logs on the nodes to see if there are any
errors?

On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
Hi All,
  
  I am following this following link " http://www.datastax.com/docs/0.7/utilities/stress_java
  " for a stress test. I am getting this notification after
  running this command 
  
  xxx.xxx.xxx.xx= my ip
  contrib/stress/bin/stress
-d xxx.xxx.xxx.xx
  Created keyspaces. Sleeping 1s
for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
Operation [44] retried 10 times - error inserting key
044 ((UnavailableException))

Operation [49] retried 10 times - error inserting key
049 ((UnavailableException))

Operation [7] retried 10 times - error inserting key 007
((UnavailableException))

Operation [6] retried 10 times - error inserting key 006
((UnavailableException))
  
  
  
  Any idea why I am getting these
things?
  
  
  Thank You
  
  
  
  



-- 
  Kirk True
  
  Founder, Principal Engineer
  
  
  
  
  
  Expert Engineering Firepower
  
  
  About us: 
  

  



Re: Stress test using Java-based stress utility

2011-07-22 Thread aaron morton
UnavailableException is raised server side when there is less than CL nodes UP 
when the request starts. 

It seems odd to get it in this case because the default replication factor used 
by stress test is 1. How many nodes do you have and have you made any changes 
to the RF ?

Also check the server side logs as Kirk says. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 18:37, Kirk True wrote:

 Have you checked the logs on the nodes to see if there are any errors?
 
 On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
 
 Hi All,
 
 I am following this following link  
 http://www.datastax.com/docs/0.7/utilities/stress_java  for a stress test. 
 I am getting this notification after running this command 
 
 xxx.xxx.xxx.xx= my ip
 contrib/stress/bin/stress -d xxx.xxx.xxx.xx
 
 Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 Operation [44] retried 10 times - error inserting key 044 
 ((UnavailableException))
 
 Operation [49] retried 10 times - error inserting key 049 
 ((UnavailableException))
 
 Operation [7] retried 10 times - error inserting key 007 
 ((UnavailableException))
 
 Operation [6] retried 10 times - error inserting key 006 
 ((UnavailableException))
 
 
 Any idea why I am getting these things?
 
 
 Thank You
 
 
 
 
 
 -- 
 Kirk True 
 Founder, Principal Engineer 
 
 mustardgrain.gif 
 
 Expert Engineering Firepower 
 
 About us: twitter.gif linkedin.gif



Re: Stress test using Java-based stress utility

2011-07-22 Thread Nilabja Banerjee
Running only one node. I dnt think it is coming for the replication
factor...  I will try to sort this out Any other suggestions from your
side is always be helpful..

:) Thank you



On 22 July 2011 14:36, aaron morton aa...@thelastpickle.com wrote:

 UnavailableException is raised server side when there is less than CL nodes
 UP when the request starts.

 It seems odd to get it in this case because the default replication factor
 used by stress test is 1. How many nodes do you have and have you made any
 changes to the RF ?

 Also check the server side logs as Kirk says.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22 Jul 2011, at 18:37, Kirk True wrote:

  Have you checked the logs on the nodes to see if there are any errors?

 On 7/21/11 10:43 PM, Nilabja Banerjee wrote:

 Hi All,

 I am following this following link  *
 http://www.datastax.com/docs/0.7/utilities/stress_java * for a stress
 test. I am getting this notification after running this command

 *xxx.xxx.xxx.xx= my ip*

 *contrib/stress/bin/stress -d xxx.xxx.xxx.xx*

 *Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 Operation [44] retried 10 times - error inserting key 044
 ((UnavailableException))

 Operation [49] retried 10 times - error inserting key 049
 ((UnavailableException))

 Operation [7] retried 10 times - error inserting key 007
 ((UnavailableException))

 Operation [6] retried 10 times - error inserting key 006
 ((UnavailableException))
 *


 *Any idea why I am getting these things?*


 *Thank You
 *


 *
 *


 --
 Kirk True
 Founder, Principal Engineer

 mustardgrain.gif http://www.mustardgrain.com/

 *Expert Engineering Firepower*

 About us: twitter.gif http://www.twitter.com/mustardgraininc
 linkedin.gif http://www.linkedin.com/company/mustard-grain-inc.





Re: Stress test using Java-based stress utility

2011-07-22 Thread Jonathan Ellis
What does nodetool ring say?

On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
nilabja.baner...@gmail.com wrote:
 Hi All,

 I am following this following link 
 http://www.datastax.com/docs/0.7/utilities/stress_java  for a stress test.
 I am getting this notification after running this command

 xxx.xxx.xxx.xx= my ip

 contrib/stress/bin/stress -d xxx.xxx.xxx.xx

 Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 Operation [44] retried 10 times - error inserting key 044
 ((UnavailableException))

 Operation [49] retried 10 times - error inserting key 049
 ((UnavailableException))

 Operation [7] retried 10 times - error inserting key 007
 ((UnavailableException))

 Operation [6] retried 10 times - error inserting key 006
 ((UnavailableException))

 Any idea why I am getting these things?

 Thank You






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Stress test using Java-based stress utility

2011-07-21 Thread Nilabja Banerjee
Hi All,

I am following this following link  *
http://www.datastax.com/docs/0.7/utilities/stress_java * for a stress test.
I am getting this notification after running this command

*xxx.xxx.xxx.xx= my ip*

*contrib/stress/bin/stress -d xxx.xxx.xxx.xx*

*Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
Operation [44] retried 10 times - error inserting key 044
((UnavailableException))

Operation [49] retried 10 times - error inserting key 049
((UnavailableException))

Operation [7] retried 10 times - error inserting key 007
((UnavailableException))

Operation [6] retried 10 times - error inserting key 006
((UnavailableException))
*


*Any idea why I am getting these things?*


*Thank You
*


*
*


Re: Timeout during stress test

2011-04-12 Thread aaron morton
Couple of hits here, one from jonathan and some previous discussions on the 
user list http://www.google.co.nz/search?q=cassandra+iostat

Same here for cfhistograms 
http://www.google.co.nz/search?q=cassandra+cfhistograms 
cfhistograms includes information on the number of sstables read during recent 
requests. As your initial cfstats showed 236 sstables I thought it may be 
useful see if there was a high number of sstables been accessed per read. 

70 requests per second is slow against a 6 node cluster where each node has 12 
cores and 96GB of ram. Something is not right.

Aaron 
On 12 Apr 2011, at 17:11, mcasandra wrote:

 
 aaron morton wrote:
 
 You'll need to provide more information, from the TP stats the read stage
 could not keep up. If the node is not CPU bound then it is probably IO
 bound. 
 
 
 What sort of read?
 How many columns was it asking for ? 
 How many columns do the rows have ?
 Was the test asking for different rows ?
 How many ops requests per second did it get up to?
 What do the io stats look like ? 
 What does nodetool cfhistograms say ?
 
 It's simple read of 1M rows with one column of avg size of 200K. Got around
 70 req per sec.
 
 Not sure how to intepret the iostats output with things happening async in
 cassandra. Can you give little description on how to interpret it?
 
 I have posted output of cfstats. Does cfhistograms provide better info?
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263859.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Timeout during stress test

2011-04-12 Thread mcasandra
 0  
  
0
17084  690 0 0  
  
0
20501  960 0 0  
  
0
24601 1272 0 0  
  
0
29521 1734 0 0  
  
0
35425 2262 0 0  
  
0
42510 2734 0 0  
  
0
51012 3098 0 0  
  
0
61214 3426 0 0  
  
0
73457 3879 0 0  
  
0
88148 4157 0 0  
  
0
1057784065 0 0  
  
0
1269343804 0 0  
  
0
1523212828 0 0  
  
0
1827851699 0 0  
  
0
219342 821 0 0  
  
0
263210 300 0249214  
  
0
315852  88 0149731  
  
0
379022  12 0 0  
  
0
454826   3 0 0  
  
0
545791   0 0 0  
  
0
654949   0 0 0  
  
0
785939   0 0 0  
  
0
943127   0 0 74915  
  
0
1131752  0 0 0  
  
0
1358102  0 0 0  
  
0
1629722  0 0 0  
  
0
1955666  0 0 0  
  
0
2346799  0 0 0  
  
0
2816159  0 0 0  
  
0
3379391  0 0 22438  
  
0
4055269  0 0 0  
  
0
4866323  0 0 0  
  
0
5839588  0 0  2559  
  
0
7007506  0 0 0  
  
0
8409007  0 0 0  
  
0
10090808 0 0 0  
  
0
12108970 0 0 0  
  
0
14530764 0 0 0  
  
0
17436917 0 0 0  
  
0
20924300 0 0 0  
  
0
25109160 0 0 0  
  
0


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6265925.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Timeout during stress test

2011-04-11 Thread mcasandra
I am running stress test using hector. In the client logs I see:

me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:32)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:256)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:227)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:221)
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl.doExecuteSlice(HColumnFamilyImpl.java:227)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl.getColumns(HColumnFamilyImpl.java:139)
at
com.riptano.cassandra.stress.SliceCommand.call(SliceCommand.java:48)
at
com.riptano.cassandra.stress.SliceCommand.call(SliceCommand.java:20)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: TimedOutException()
at
org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:7174)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:540)
at
org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:512)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:236)


But I don't see anything in cassandra logs.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6262430.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread mcasandra
I see this occurring often when all cassandra nodes all of a sudden show CPU
spike. All reads fail for about 2 mts. GC.log and system.log doesn't reveal
much.

Only think I notice is that when I restart nodes there are tons of files
that gets deleted. cfstats from one of the nodes looks like this:

nodetool -h `hostname` tpstats
Pool NameActive   Pending  Completed
ReadStage2727  21491
RequestResponseStage  0 0 201641
MutationStage 0 0 236513
ReadRepairStage   0 0   7222
GossipStage   0 0  31498
AntiEntropyStage  0 0  0
MigrationStage0 0  0
MemtablePostFlusher   0 0324
StreamStage   0 0  0
FlushWriter   0 0324
FILEUTILS-DELETE-POOL 0 0   1220
MiscStage 0 0  0
FlushSorter   0 0  0
InternalResponseStage 0 0  0
HintedHandoff 1 3  9

--


Keyspace: StressKeyspace
Read Count: 21957
Read Latency: 46.91765058978913 ms.
Write Count: 222104
Write Latency: 0.008302124230090408 ms.
Pending Tasks: 0
Column Family: StressStandard
SSTable count: 286
Space used (live): 377916657941
Space used (total): 377916657941
Memtable Columns Count: 362
Memtable Data Size: 164403613
Memtable Switch Count: 326
Read Count: 21958
Read Latency: 631.464 ms.
Write Count: 222104
Write Latency: 0.007 ms.
Pending Tasks: 0
Key cache capacity: 100
Key cache size: 22007
Key cache hit rate: 0.002453626459907744
Row cache: disabled
Compacted row minimum size: 87
Compacted row maximum size: 5839588
Compacted row mean size: 552698




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263087.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread aaron morton
TimedOutException means the cluster could not perform the request in 
rpc_timeout time. The client should retry as the problem may be transitory. 

In this case read performance may have slowed down due to the number of 
sstables 286. It hard to tell without knowing what the workload is.

Aaron

On 12 Apr 2011, at 09:56, mcasandra wrote:

 I see this occurring often when all cassandra nodes all of a sudden show CPU
 spike. All reads fail for about 2 mts. GC.log and system.log doesn't reveal
 much.
 
 Only think I notice is that when I restart nodes there are tons of files
 that gets deleted. cfstats from one of the nodes looks like this:
 
 nodetool -h `hostname` tpstats
 Pool NameActive   Pending  Completed
 ReadStage2727  21491
 RequestResponseStage  0 0 201641
 MutationStage 0 0 236513
 ReadRepairStage   0 0   7222
 GossipStage   0 0  31498
 AntiEntropyStage  0 0  0
 MigrationStage0 0  0
 MemtablePostFlusher   0 0324
 StreamStage   0 0  0
 FlushWriter   0 0324
 FILEUTILS-DELETE-POOL 0 0   1220
 MiscStage 0 0  0
 FlushSorter   0 0  0
 InternalResponseStage 0 0  0
 HintedHandoff 1 3  9
 
 --
 
 
 Keyspace: StressKeyspace
Read Count: 21957
Read Latency: 46.91765058978913 ms.
Write Count: 222104
Write Latency: 0.008302124230090408 ms.
Pending Tasks: 0
Column Family: StressStandard
SSTable count: 286
Space used (live): 377916657941
Space used (total): 377916657941
Memtable Columns Count: 362
Memtable Data Size: 164403613
Memtable Switch Count: 326
Read Count: 21958
Read Latency: 631.464 ms.
Write Count: 222104
Write Latency: 0.007 ms.
Pending Tasks: 0
Key cache capacity: 100
Key cache size: 22007
Key cache hit rate: 0.002453626459907744
Row cache: disabled
Compacted row minimum size: 87
Compacted row maximum size: 5839588
Compacted row mean size: 552698
 
 
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263087.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Timeout during stress test

2011-04-11 Thread mcasandra
It looks like hector did retry on all the nodes and failed. Does this then
mean cassandra is down for clients in this scenario? That would be bad.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread aaron morton
It means the cluster is currently overloaded and unable to complete requests in 
time at the CL specified. 

Aaron

On 12 Apr 2011, at 11:18, mcasandra wrote:

 It looks like hector did retry on all the nodes and failed. Does this then
 mean cassandra is down for clients in this scenario? That would be bad.
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Timeout during stress test

2011-04-11 Thread mcasandra
But I don't understand the reason for oveload. It was doing simple read of 12
threads and reasing 5 rows. Avg CPU only 20%, No GC issues that I see. I
would expect cassandra to be able to process more with 6 nodes, 12 core, 96
GB RAM and 4 GB heap.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263470.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread aaron morton
You'll need to provide more information, from the TP stats the read stage could 
not keep up. If the node is not CPU bound then it is probably IO bound. 


What sort of read?
How many columns was it asking for ? 
How many columns do the rows have ?
Was the test asking for different rows ?
How many ops requests per second did it get up to?
What do the io stats look like ? 
What does nodetool cfhistograms say ?

Aaron

On 12 Apr 2011, at 13:02, mcasandra wrote:

 But I don't understand the reason for oveload. It was doing simple read of 12
 threads and reasing 5 rows. Avg CPU only 20%, No GC issues that I see. I
 would expect cassandra to be able to process more with 6 nodes, 12 core, 96
 GB RAM and 4 GB heap.
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263470.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Timeout during stress test

2011-04-11 Thread Terje Marthinussen
I notice you have pending hinted handoffs?

Look for errors related to that. We have seen occasional corruptions in the
hinted handoff sstables,

If you are stressing the system to its limits, you may also consider playing
with more with the number of  read/write threads  (concurrent_reads/writes)
as well as rate limiting the number of requests you can get per node
(throttle limit).

We have seen similar issue when sending large number of requests to a
cluster (read/write threads running out, timeouts, nodes marked as down).

Terje

We have seen similar issues when

On Tue, Apr 12, 2011 at 9:56 AM, aaron morton aa...@thelastpickle.comwrote:

 It means the cluster is currently overloaded and unable to complete
 requests in time at the CL specified.

 Aaron

 On 12 Apr 2011, at 11:18, mcasandra wrote:

  It looks like hector did retry on all the nodes and failed. Does this
 then
  mean cassandra is down for clients in this scenario? That would be bad.
 
  --
  View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html
  Sent from the cassandra-u...@incubator.apache.org mailing list archive
 at Nabble.com.




Re: Timeout during stress test

2011-04-11 Thread mcasandra

aaron morton wrote:
 
 You'll need to provide more information, from the TP stats the read stage
 could not keep up. If the node is not CPU bound then it is probably IO
 bound. 
 
 
 What sort of read?
 How many columns was it asking for ? 
 How many columns do the rows have ?
 Was the test asking for different rows ?
 How many ops requests per second did it get up to?
 What do the io stats look like ? 
 What does nodetool cfhistograms say ?
 
It's simple read of 1M rows with one column of avg size of 200K. Got around
70 req per sec.

Not sure how to intepret the iostats output with things happening async in
cassandra. Can you give little description on how to interpret it?

I have posted output of cfstats. Does cfhistograms provide better info?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263859.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What need to be monitored while running stress test

2011-04-09 Thread Stu Hood
The storage proxy latencies are the primary metric: in particular, the
latency histograms show the distribution of query times.


On Fri, Apr 8, 2011 at 5:27 PM, mcasandra mohitanch...@gmail.com wrote:

 What are the key things to monitor while running a stress test? There is
 tons
 of details in nodetoll tpstats/netstats/cfstats. What in particular should
 I
 be looking at?

 Also, I've been looking at iostat and await really goes high but cfstats
 shows low latency in microsecs. Is latency in cfstats calculated per
 operation?

 I am just trying to understand what I need to look just to make sure I
 don't
 overlook important points in process of evaluating cassandra.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6255765.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: What need to be monitored while running stress test

2011-04-09 Thread mcasandra
What is a storage proxy latency?

By query latency you mean the one in cfstats and cfhistorgrams?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6257932.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: CF config for Stress Test

2011-04-09 Thread aaron morton
If you just want to benchmark the cluster it wont matter too much, though I 
would set keys_cached to 0 and increate memtable throughput to 64 or 128. If 
you are testing to get a better idea for your app then use similar settings to 
your app. 

keys_cahced is the number of keys

for concurrent_readers and concurrent_writers see the comments in 
cong/cassandra.yaml.

I could not find this KS definition in the hector code base so not sure why 
they chose those values. 

Aaron
 
On 9 Apr 2011, at 11:10, mcasandra wrote:

 I am starting a stress test using hector on 6 node machine 4GB heap and 12
 core. In hectore readme this is what I got by default:
 
 create keyspace StressKeyspace
with replication_factor = 3
and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
 
 use StressKeyspace;
 drop column family StressStandard;
 create column family StressStandard
with comparator = UTF8Type
and keys_cached = 1
and memtable_flush_after = 1440
and memtable_throughput = 32;
 
 Are these good values? I was thinking of highher keys_cached but not sure if
 it's in bytes or no of keys.
 
 Also not sure how to tune memtable values.
 
 I have set concurrent_readers to 32 and writers to 48.
 
 Can someone please help me with good values that I can start this test with?
 
 Also, any other suggested values that I need to change?
 
 Thanks
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CF-config-for-Stress-Test-tp6255608p6255608.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: What need to be monitored while running stress test

2011-04-09 Thread aaron morton
in jconsole MBean org.apache.cassandra.db.StorageProxy 

It shows the latency for read and write operations, not just per CF 

Aaron

On 10 Apr 2011, at 11:37, mcasandra wrote:

 What is a storage proxy latency?
 
 By query latency you mean the one in cfstats and cfhistorgrams?
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6257932.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



CF config for Stress Test

2011-04-08 Thread mcasandra
I am starting a stress test using hector on 6 node machine 4GB heap and 12
core. In hectore readme this is what I got by default:

create keyspace StressKeyspace
with replication_factor = 3
and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';

use StressKeyspace;
drop column family StressStandard;
create column family StressStandard
with comparator = UTF8Type
and keys_cached = 1
and memtable_flush_after = 1440
and memtable_throughput = 32;

Are these good values? I was thinking of highher keys_cached but not sure if
it's in bytes or no of keys.

Also not sure how to tune memtable values.

I have set concurrent_readers to 32 and writers to 48.

Can someone please help me with good values that I can start this test with?

Also, any other suggested values that I need to change?

Thanks

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CF-config-for-Stress-Test-tp6255608p6255608.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


What need to be monitored while running stress test

2011-04-08 Thread mcasandra
What are the key things to monitor while running a stress test? There is tons
of details in nodetoll tpstats/netstats/cfstats. What in particular should I
be looking at?

Also, I've been looking at iostat and await really goes high but cfstats
shows low latency in microsecs. Is latency in cfstats calculated per
operation?

I am just trying to understand what I need to look just to make sure I don't
overlook important points in process of evaluating cassandra.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6255765.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Problems with Python Stress Test

2011-02-04 Thread Sameer Farooqui
Brandon,

Thanks for the response. I have also noticed that stress.py's progress
interval gets thrown off in low memory situations.

What did you mean by contrib/stress on 0.7 instead.  I don't see that dir
in the src version of 0.7.

- Sameer


On Thu, Feb 3, 2011 at 5:22 PM, Brandon Williams dri...@gmail.com wrote:

 On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui 
 cassandral...@gmail.comwrote:

 Hi guys,

 I was playing around with the stress.py test this week and noticed a few
 things.

 1) Progress-interval does not always work correctly. I set it to 5 in the
 example below, but am instead getting varying intervals:


 Generally indicates that the client machine is being overloaded in my
 experience.

 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
 what is the difference between the interval_key_rate and the
 interval_op_rate? For example in the example above, the first row shows 6662
 keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
 interval_op_rate.


 There should be no difference unless you're doing range slices, but IPC
 timing makes them vary somewhat.

 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
 grow by x after the test. In the example below I tried to write 500,000 keys
 * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
 checked the amount of disk space used after the test it actually grew by
 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
 commit log got duplicate copies of the data as the SSTables?


 Commitlogs could be part of it, you're not factoring in the column names,
 and then there's index and bloom filter overhead.

 Use contrib/stress on 0.7 instead.

 -Brandon



Problems with Python Stress Test

2011-02-03 Thread Sameer Farooqui
Hi guys,

I was playing around with the stress.py test this week and noticed a few
things.

1) Progress-interval does not always work correctly. I set it to 5 in the
example below, but am instead getting varying intervals:

*techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python
stress.py --num-keys=10 --columns=5 --column-size=32 --operation=insert
--progress-interval=5 --threads=4 --nodes=170.252.179.222
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
6662,1332,1335,0.00307796342135,5
11607,989,988,0.00476862022199,12
20297,1738,1736,0.00273238550807,18
30631,2066,2068,0.00202261635614,24
37291,1332,1331,0.00325975901372,29
47514,2044,2044,0.00193106963725,35
56618,1820,1821,0.00276346638249,41
68652,2406,2406,0.00179436958884,47
77745,1818,1820,0.00220694060007,52
87351,1921,1918,0.00236015612201,58
97167,1963,1963,0.00230505042379,64
10,566,566,0.00223569174853,66*


2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
what is the difference between the interval_key_rate and the
interval_op_rate? For example in the example above, the first row shows 6662
keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
interval_op_rate.

The second row took 7 seconds to update instead of the requested 5. However,
the interval_op_rate and interval_key_rate are being calculated based on my
requested 5 seconds instead of the actual observed 7 seconds.

(11607-6662)/5=989
(11607-6662)/7 = 706

Shouldn't it be basing the calculations off the 7 seconds?


3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
grow by x after the test. In the example below I tried to write 500,000 keys
* 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
checked the amount of disk space used after the test it actually grew by
2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
commit log got duplicate copies of the data as the SSTables?

Also, notice how to progress interval got thrown off after 40 seconds.


techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/mapper/cassandra7rc4-root
   7583436   *2515864   *4682344  35% /
none633244   208633036   1% /dev
none640368 0640368   0% /dev/shm
none64036856640312   1% /var/run
none640368 0640368   0% /var/lock
/dev/sda1   233191 20601200149  10% /boot

techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python
stress.py --num-keys=50 --columns=5 --operation=insert
--progress-interval=5 --threads=1 --nodes=170.252.179.222
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
15562,3112,3112,0.000300011955333,5
31643,3216,3216,0.000290757187504,10
42968,2265,2265,0.000423845265875,15
54071,2220,2220,0.000430288759747,20
66491,2484,2484,0.000382423304897,25
79891,2680,2680,0.000351728307667,30
91758,2373,2373,0.000402696775367,35
102179,2084,2084,0.000461982612291,40
114003,2364,2364,0.000403893998092,46
126509,2501,2501,0.000379724634489,51
138047,2307,2307,0.000414365229356,56
150261,2442,2442,0.000390332772296,61
164019,2751,2751,0.000343320345113,66
175390,2274,2274,0.000421584286756,71
186564,2234,2234,0.000429319251473,76
198292,2345,2345,0.00040838057315,81
210186,2378,2378,0.000400560030882,87
225144,2991,2991,0.000314564943345,92
236474,2266,2266,0.000422214746265,97
249940,2693,2693,0.000349487200297,102
264410,2894,2894,0.00030166366303,107
275429,2203,2203,0.000464002475276,112
286430,2200,2200,0.00043832517821,117
299217,2557,2557,0.000371891478764,122
313800,2916,2916,0.000322412596002,128
325252,2290,2290,0.000417413284343,133
336031,2155,2155,0.000445155976201,138
347257,2245,2245,0.000426658924816,143
357493,2047,2047,0.000472509730556,148
372151,2931,2931,0.000321278794594,153
384655,2500,2500,0.000381667455343,158
395604,2189,2189,0.000439286896144,163
409713,2821,2821,0.000334938358759,168
423162,2689,2689,0.000351835071877,174
434276,,,0.000432009316829,179
444809,2106,2106,0.00045844612893,184
458190,2676,2676,0.000353130326037,189
470852,2532,2532,0.000374360740552,194
481333,2096,2096,0.000462788910416,199
492458,2225,2225,0.000431290422932,204
50,1508,1508,0.000353647808408,207


techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/mapper/cassandra7rc4-root
   7583436   2684920   4513288  38% /
none633244   208633036   1% /dev
none640368 0640368   0% /dev/shm
none64036856640312   1% /var/run
none640368 0640368   0% /var/lock
/dev/sda1   233191  

Re: Problems with Python Stress Test

2011-02-03 Thread Brandon Williams
On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui cassandral...@gmail.comwrote:

 Hi guys,

 I was playing around with the stress.py test this week and noticed a few
 things.

 1) Progress-interval does not always work correctly. I set it to 5 in the
 example below, but am instead getting varying intervals:


Generally indicates that the client machine is being overloaded in my
experience.

2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
 what is the difference between the interval_key_rate and the
 interval_op_rate? For example in the example above, the first row shows 6662
 keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
 interval_op_rate.


There should be no difference unless you're doing range slices, but IPC
timing makes them vary somewhat.

3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
 grow by x after the test. In the example below I tried to write 500,000 keys
 * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
 checked the amount of disk space used after the test it actually grew by
 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
 commit log got duplicate copies of the data as the SSTables?


Commitlogs could be part of it, you're not factoring in the column names,
and then there's index and bloom filter overhead.

Use contrib/stress on 0.7 instead.

-Brandon


Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
Hi All,

I was able to run contrib/stress at a very impressive throughput. Single
threaded client was able to pump 2,000 inserts per second with 0.4 ms latency.
Multithreaded client was able to pump 7,000 inserts per second with 7ms latency.

Thank you very much for your help!

Oleg




Re: Stress test inconsistencies

2011-01-26 Thread Jonathan Shook
Would you share with us the changes you made, or problems you found?

On Wed, Jan 26, 2011 at 10:41 AM, Oleg Proudnikov ol...@cloudorange.com wrote:
 Hi All,

 I was able to run contrib/stress at a very impressive throughput. Single
 threaded client was able to pump 2,000 inserts per second with 0.4 ms latency.
 Multithreaded client was able to pump 7,000 inserts per second with 7ms 
 latency.

 Thank you very much for your help!

 Oleg





Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
I returned to periodic commit log fsync.


Jonathan Shook jshook at gmail.com writes:

 
 Would you share with us the changes you made, or problems you found?
 




Re: Stress test inconsistencies

2011-01-25 Thread Tyler Hobbs
Try using something higher than -t 1, like -t 100.

- Tyler

On Mon, Jan 24, 2011 at 9:38 PM, Oleg Proudnikov ol...@cloudorange.comwrote:

 Hi All,

 I am struggling to make sense of a simple stress test I ran against the
 latest
 Cassandra 0.7. My server performs very poorly compared to a desktop and
 even a
 notebook.

 Here is the command I execute - a single threaded insert that runs on the
 same
 host as Cassnadra does (I am using new contrib/stress but old py_stress
 produces
 similar results):

 ./stress -t 1 -o INSERT -c 30 -n 1 -i 1

 On a SUSE Linux server with a 4-core Intel XEON I get maximum 30 inserts a
 second with 40ms latency. But on a Windows desktop I get incredible 200-260
 inserts a second with a 4ms latency!!! Even on the smallest MacBook Pro I
 get
 bursts of high throughput - 100+ inserts a second.

 Could you please help me figure out what is wrong with my server? I tried
 several servers actually with the same results. I would appreciate any help
 in
 tracing down the bottleneck. Configuration is the same in all tests with
 the
 server having the advantage of separate physical disks for commitlog and
 data.

 Could you also share with me what numbers you get or what is reasonable to
 expect from this test?

 Thank you very much,
 Oleg


 Here is the output for the Linux server, Windows desktop and MacBook Pro,
 one
 line per second:

 Linux server - INtel XEON X3330 @ 2.666Mhz, 4G RAM, 2G heap

 Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 19,19,19,0.05947368421052632,1
 46,27,27,0.04274074074074074,2
 70,24,24,0.04733,3
 95,25,25,0.04696,4
 119,24,24,0.048208333,5
 147,28,28,0.04189285714285714,7
 177,30,30,0.03904,8
 206,29,29,0.04006896551724138,9
 235,29,29,0.03903448275862069,10

 Windows desktop: Core2 Duo CPU E6550 @ 2.333Mhz, 2G RAM, 1G heap

 Keyspace already exists.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 147,147,147,0.005292517006802721,1
 351,204,204,0.0042009803921568625,2
 527,176,176,0.006551136363636364,3
 718,191,191,0.005617801047120419,4
 980,262,262,0.00400763358778626,5
 1206,226,226,0.004150442477876107,6
 1416,210,210,0.005619047619047619,7
 1678,262,262,0.0040038167938931295,8

 MacBook Pro: Core2 Duo CPU @ 2.26Mhz, 2G RAM, 1G heap

 Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 0,0,0,NaN,1
 7,7,7,0.21185714285714285,2
 47,40,40,0.026925,3
 171,124,124,0.007967741935483871,4
 258,87,87,0.01206896551724138,6
 294,36,36,0.022444,7
 303,9,9,0.14378,8
 307,4,4,0.2455,9
 313,6,6,0.128,10
 508,195,195,0.007938461538461538,11
 792,284,284,0.0035985915492957746,12
 882,90,90,0.01219,13






Re: Stress test inconsistencies

2011-01-25 Thread Oleg Proudnikov
Tyler Hobbs tyler at riptano.com writes:

 Try using something higher than -t 1, like -t 100.- Tyler



Thank you, Tyler!

When I run contrib/stress with a higher thread count, the server does scale to
200 inserts a second with latency of 200ms. At the same time Windows desktop
scales to 900 inserts a second and latency of 120ms. There is a huge difference
that I am trying to understand and eliminate.

In my real life bulk load I have to stay with a single threaded client for the
POC I am doing. The only option I have is to run several client processes... My
real life load is heavier than what contrib/stress does. It takes several days
to bulk load 4 million batch mutations !!! It is really painful :-( Something is
just not right...

Oleg






Re: Stress test inconsistencies

2011-01-25 Thread buddhasystem

Oleg,

I'm a novice at this, but for what it's worth I can't imagine you can have a
_sustained_ 1kHz insertion rate on a single machine which also does some
reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem
to square with a typical seek time on a hard drive.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Stress-test-inconsistencies-tp5957467p5960182.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Stress test inconsistencies

2011-01-25 Thread Brandon Williams
On Tue, Jan 25, 2011 at 1:23 PM, Oleg Proudnikov ol...@cloudorange.comwrote:

 When I run contrib/stress with a higher thread count, the server does scale
 to
 200 inserts a second with latency of 200ms. At the same time Windows
 desktop
 scales to 900 inserts a second and latency of 120ms. There is a huge
 difference
 that I am trying to understand and eliminate.


Those are really low numbers, are you still testing with 10k rows?  That's
not enough, try 1M to give both JVMs enough time to warm up.

-Brandon


Re: Stress test inconsistencies

2011-01-25 Thread Oleg Proudnikov
Brandon Williams driftx at gmail.com writes:

 
 On Tue, Jan 25, 2011 at 1:23 PM, Oleg Proudnikov olegp at cloudorange.com
wrote:
 
 When I run contrib/stress with a higher thread count, the server does scale to
 200 inserts a second with latency of 200ms. At the same time Windows desktop
 scales to 900 inserts a second and latency of 120ms. There is a huge 
 difference
 that I am trying to understand and eliminate.
 
 
 Those are really low numbers, are you still testing with 10k rows?  That's not
enough, try 1M to give both JVMs enough time to warm up.
 
 
 -Brandon 
 

I agree, Brandon, the numbers are very low! The warm up does not seem to make
any difference though... There is something that is holding the server back
because the CPU is very low. I am trying to understand where this bottleneck is
on the Linux server. I do not think it is Cassandra's config as I use the same
config on Windows and get much higher numbers as I described.

Oleg




Re: Stress test inconsistencies

2011-01-25 Thread Anthony John
Look at iostat -x 10 10 when he active par tof your test is running. there
should be something called svc_t - that should be in the 10ms range, and
await should be low.

Will tell you if IO is slow, or if IO is not being issued.

Also, ensure that you ain't swapping with something like swapon -s

On Tue, Jan 25, 2011 at 3:04 PM, Oleg Proudnikov ol...@cloudorange.comwrote:

 buddhasystem potekhin at bnl.gov writes:

 
 
  Oleg,
 
  I'm a novice at this, but for what it's worth I can't imagine you can
 have a
  _sustained_ 1kHz insertion rate on a single machine which also does some
  reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't
 seem
  to square with a typical seek time on a hard drive.
 
  Maxim
 

 Maxim,

 As I understand during inserts Cassandra should not be constrained by
 random
 seek time as it uses sequential writes. I do get high numbers on Windows
 but
 there is something that is holding back my Linux server. I am trying to
 understand what it is.

 Oleg






Cassandra 0.6.2 stress test failing due to setKeyspace issue

2010-07-01 Thread maneela a
Can someone direct me how to resolve this issue in cassandra 0.6.2 version?
./stress.py -o insert -n 1 -y regular -d 
ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going
Created keyspaces.  Sleeping 1s for propagation.Traceback (most recent call 
last):  File ./stress.py, line 381, in module    benchmark()  File 
./stress.py, line 363, in insert    threads = 
self.create_threads('insert')  File ./stress.py, line 325, in 
create_threads    th = OperationFactory.create(type, i, self.opcounts, 
self.keycounts, self.latencies)  File ./stress.py, line 310, in create    
return Inserter(i, opcounts, keycounts, latencies)  File ./stress.py, line 
178, in __init__    self.cclient.set_keyspace('Keyspace1')  File 
/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py, line 
333, in set_keyspace    self.recv_set_keyspace()  File 
/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py, line 
349, in recv_set_keyspace    raise xthrift.Thrift.TApplicationException: 
Invalid method name: 'set_keyspace'

Niru


  

Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue

2010-07-01 Thread maneela a
Thanks Jonathan

--- On Thu, 7/1/10, Jonathan Ellis jbel...@gmail.com wrote:

From: Jonathan Ellis jbel...@gmail.com
Subject: Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue
To: user@cassandra.apache.org
Date: Thursday, July 1, 2010, 3:32 PM

you're running a 0.7 stress.py against a 0.6 cassandra, that's not going to work

On Thu, Jul 1, 2010 at 12:16 PM, maneela a manee...@yahoo.com wrote:




Can someone direct me how to resolve this issue in cassandra 0.6.2 version?
./stress.py -o insert -n 1 -y regular -d 
ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going


Created keyspaces.  Sleeping 1s for propagation.Traceback (most recent call 
last):  File ./stress.py, line 381, in module    benchmark()

  File ./stress.py, line 363, in insert    threads = 
self.create_threads('insert')  File ./stress.py, line 325, in 
create_threads    th = OperationFactory.create(type, i, self.opcounts, 
self.keycounts, self.latencies)

  File ./stress.py, line 310, in create  
  return Inserter(i, opcounts, keycounts, latencies)  File ./stress.py, line 
178, in __init__    self.cclient.set_keyspace('Keyspace1')  File 
/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py, line 
333, in set_keyspace

    self.recv_set_keyspace()  File 
/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py, line 
349, in recv_set_keyspace    raise xthrift.Thrift.TApplicationException: 
Invalid method name: 'set_keyspace'



Niru








  


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com