Could ring cache really improve performance in Cassandra?

2014-12-07 Thread kong
Hi, 

I'm doing stress test on Cassandra. And I learn that using ring cache can
improve the performance because the client requests can directly go to the
target Cassandra server and the coordinator Cassandra node is the desired
target node. In this way, there is no need for coordinator node to route the
client requests to the target node, and maybe we can get the linear
performance increment.

 

However, in my stress test on an Amazon EC2 cluster, the test results are
weird. Seems that there's no performance improvement after using ring cache.
Could anyone help me explain this results? (Also, I think the results of
test without ring cache is weird, because there's no linear increment on QPS
when new nodes are added. I need help on explaining this, too). The results
are as follows:

 

INSERT(write):


Node count

Replication factor

QPS(No ring cache)

QPS(ring cache)


1

1

18687

20195


2

1

20793

26403


2

2

22498

21263


4

1

28348

30010


4

3

28631

24413

 

SELECT(read):


Node count

Replication factor

QPS(No ring cache)

QPS(ring cache)


1

1

24498

22802


2

1

28219

27030


2

2

35383

36674


4

1

34648

28347


4

3

52932

52590

 

 

Thank you very much,

Joy



Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-06 Thread kong
Hi,

I am doing stress test on Datastax Cassandra Community 2.1.2, not using the
provided stress test tool, but use my own stress-test client code instead(I
write some C++ stress test code). My Cassandra cluster is deployed on Amazon
EC2, using the provided Datastax Community AMI( HVM instances ) in the
Datastax document, and I am not using EBS, just using the ephemeral storage
by default. The EC2 type of Cassandra servers are m3.xlarge. I use another
EC2 instance for my stress test client, which is of type r3.8xlarge. Both
the Cassandra sever nodes and stress test client node are in us-east. I test
the Cassandra cluster which is made up of 1 node, 2 nodes, and 4 nodes
separately. I just do INSERT test and SELECT test separately, but the
performance doesn't get linear increment when new nodes are added. Also I
get some weird results. My test results are as follows(I do 1 million
operations and I try to get the best QPS when the max latency is no more
than 200ms, and the latencies are measured from the client side. The QPS is
calculated by total_operations/total_time).



INSERT(write):


Node count

Replication factor

  QPS

Average latency(ms)

Min latency(ms)

.95 latency(ms)

.99 latency(ms)

.999 latency(ms)

Max latency(ms)


1

1

18687

2.08

1.48

2.95

5.74

52.8

205.4


2

1

20793

3.15

0.84

7.71

41.35

88.7

232.7


2

2

22498

3.37

0.86

6.04

36.1

221.5

649.3


4

1

28348

4.38

0.85

8.19

64.51

169.4

251.9


4

3

28631

5.22

0.87

18.68

68.35

167.2

288

   

SELECT(read):


Node count

Replication factor

QPS

Average latency(ms)

Min latency(ms)

.95 latency(ms)

.99 latency(ms)

.999 latency(ms)

Max latency(ms)


1

1

24498

4.01

1.51

7.6

12.51

31.5

129.6


2

1

28219

3.38

0.85

9.5

17.71

39.2

152.2


2

2

35383

4.06

0.87

9.71

21.25

70.3

215.9


4

1

34648

2.78

0.86

6.07

14.94

30.8

134.6


4

3

52932

3.45

0.86

10.81

21.05

37.4

189.1

 

The test data I use is generated randomly, and the schema I use is like (I
use the cqlsh to create the columnfamily/table):

CREATE TABLE table(

id1  varchar,

ts   varchar,

id2  varchar,

msg  varchar,

PRIMARY KEY(id1, ts, id2));

So the fields are all string and I generate each character of the string
randomly, using srand(time(0)) and rand() in C++, so I think my test data
could be uniformly distributed into the Cassandra cluster. And, in my client
stress test code, I use thrift C++ interface, and the basic operation I do
is like:

thrift_client.execute_cql3_query(INSERT INTO table WHERE id1=xxx, ts=xxx,
id2=xxx, msg=xxx); and thrift_client.execute_cql3_query(SELECT FROM table
WHERE id1=xxx); 

Each data entry I INSERT of SELECT is of around 100 characters.

On my stress test client, I create several threads to send the read and
write requests, each thread having its own thrift client, and at the
beginning all the thrift clients connect to the Cassandra servers evenly.
For example, I create 160 thrift clients, and each 40 clients of them
connect to one server node, in a 4 node cluster.

 

So, 

1.   Could anyone help me explain my test results? Why does the
performance ( QPS ) just get a little increment when new nodes are added? 

2.   I learn from the materials that, Cassandra has better write
performance than read. But why in my case the read performance is better? 

3.   I also use the OpsCenter to monitor the real-time performance of my
cluster. But when I get the average QPS above, the operations/s provided by
OpsCenter is around 1+ for write peak and 5000+ for read peak.  Why is
my result inconsistent with that from OpsCenter?

4.   Are there any unreasonable things in my test method, such as test
data and QPS calculation?

 

Thank you very much,

Joy