I have been using the cassandra-stress tool to evaluate my cassandra
cluster for quite some time now. My problem is that I am not able to
comprehend the results generated for my specific use case.

My schema looks something like this:

CREATE TABLE Table_test(
      ID uuid,
      Time timestamp,
      Value double,
      Date timestamp,
      PRIMARY KEY ((ID,Date), Time)
) WITH COMPACT STORAGE;

I have parsed this information in a custom yaml file and used parameters
n=10000, threads=100 and the rest are default options (cl=one, mode=native
cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

A few specifics of the custom yaml file are as follows:

insert:
    partitions: fixed(100)
    select: fixed(1)/2
    batchtype: UNLOGGED

columnspecs:
    -name: Time
     size: fixed(1000)
    -name: ID
     size: uniform(1..100)
    -name: Date
     size: uniform(1..10)
    -name: Value
     size: uniform(-100..100)

My observations so far are as follows (Please correct me if I am wrong):

   1. With n=10000 and time: fixed(1000), the number of rows getting
   inserted is 10 million. (10000*1000=10000000)
   2. The number of row-keys/partitions is 10000(i.e n), within which 100
   partitions are taken at a time (which means 100 *1000 = 100000 key-value
   pairs) out of which 50000 key-value pairs are processed at a time. (This is
   because of select: fixed(1)/2 ~ 50%)

The output message also confirms the same:

Generating batches with [100..100] partitions and [50000..50000] rows
(of[100000..100000] total rows in the partitions)

The results that I get are the following for consecutive runs with the same
configuration as above:

Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
1     56           19     1885           943246     3.0
2     46           46     4648          2325498     1.0
3     27           30     2982          1489870     0.9
4     59           19     1932           966034     3.1
5     100          17     1730           865182     5.8

Now what I need to understand are as follows:

   1. Which among these metrics is the throughput i.e, No. of records
   inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s
   the Row_rate, can I safely conclude here that I am able to insert close to
   1 million records per second? Any thoughts on what the Op_rate and
   Partition_rate mean in this case?
   2. Why is it that the Total_ops vary so drastically in every run ? Has
   the number of threads got anything to do with this variation? What can I
   conclude here about the stability of my Cassandra setup?
   3. How do I determine the batch size per thread here? In my example, is
   the batch size 50000?

Thanks in advance.



-- 
Nisha Menon
BTech (CS) Sahrdaya CET,
MTech (CS) IIIT Banglore.

Reply via email to