Your insert settings look unrealistic since I doubt you would be writing 50k rows at a time. Try to set this to 1 per partition and you should get much more consistent numbers across runs I would think. select: fixed(1)/100000
On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon <nisha.meno...@gmail.com> wrote: > I have been using the cassandra-stress tool to evaluate my cassandra cluster > for quite some time now. My problem is that I am not able to comprehend the > results generated for my specific use case. > > My schema looks something like this: > > CREATE TABLE Table_test( > ID uuid, > Time timestamp, > Value double, > Date timestamp, > PRIMARY KEY ((ID,Date), Time) > ) WITH COMPACT STORAGE; > > I have parsed this information in a custom yaml file and used parameters > n=10000, threads=100 and the rest are default options (cl=one, mode=native > cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup. > > A few specifics of the custom yaml file are as follows: > > insert: > partitions: fixed(100) > select: fixed(1)/2 > batchtype: UNLOGGED > > columnspecs: > -name: Time > size: fixed(1000) > -name: ID > size: uniform(1..100) > -name: Date > size: uniform(1..10) > -name: Value > size: uniform(-100..100) > > My observations so far are as follows (Please correct me if I am wrong): > > With n=10000 and time: fixed(1000), the number of rows getting inserted is > 10 million. (10000*1000=10000000) > The number of row-keys/partitions is 10000(i.e n), within which 100 > partitions are taken at a time (which means 100 *1000 = 100000 key-value > pairs) out of which 50000 key-value pairs are processed at a time. (This is > because of select: fixed(1)/2 ~ 50%) > > The output message also confirms the same: > > Generating batches with [100..100] partitions and [50000..50000] rows > (of[100000..100000] total rows in the partitions) > > The results that I get are the following for consecutive runs with the same > configuration as above: > > Run Total_ops Op_rate Partition_rate Row_Rate Time > 1 56 19 1885 943246 3.0 > 2 46 46 4648 2325498 1.0 > 3 27 30 2982 1489870 0.9 > 4 59 19 1932 966034 3.1 > 5 100 17 1730 865182 5.8 > > Now what I need to understand are as follows: > > Which among these metrics is the throughput i.e, No. of records inserted per > second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate, > can I safely conclude here that I am able to insert close to 1 million > records per second? Any thoughts on what the Op_rate and Partition_rate mean > in this case? > Why is it that the Total_ops vary so drastically in every run ? Has the > number of threads got anything to do with this variation? What can I > conclude here about the stability of my Cassandra setup? > How do I determine the batch size per thread here? In my example, is the > batch size 50000? > > Thanks in advance. -- http://twitter.com/tjake