[ 
https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250242#comment-14250242
 ] 

Ariel Weisberg edited comment on CASSANDRA-8503 at 12/17/14 6:17 PM:
---------------------------------------------------------------------

I think there are two general classes of benchmarks you would run in CI. 
Representative user workloads, and targeted microbenchmark workloads. Targeted 
workloads are a huge help during ongoing development because they magnify the 
impact of regressions from code changes that are harder to notice in 
representative workloads. They also point to the specific subsystem being 
benchmarked.

I will just cover the microbenchmarks. The full matrix is large so there is an 
element of wanting ponies, but the reality is that they are all interesting 
from a preventing performance regressions and understanding the impact of 
ongoing changes perspective.

Benchmark the stress client, so excess server capacity and a single client 
testing lots of small messages. Lots of large messages. Stuff the servers can 
answer as fast as possible. The flip side of this workload is the same thing 
but for the server where you measure how many trivially answerable tiny queries 
you can shove through a cluster given excess client capacity. When testing the 
server this might also be when you test the matrix of replication and 
consistency levels.

Benchmark perfomance of non-prepared statements.

Benchmark performance of preparing statements?
 
A full test matrix for data intensive workloads would test read, write, and 
50/50, and for a bonus 90/10. Single cell partitions with a small value and a 
large value, and a range of wide rows (small, medium, large). All 3 compaction 
strategies with compression on/off. Data intensive workloads also need to run 
against a spinning rust and SSDs.

CQL specific microbenchmarks against specific CQL datatypes. If there are 
interactions that are important we should capture those.

Counters

Lightweight transactions

The matrix also needs to include different permutations of replication 
strategies and consistency levels. Maybe we can constrain those variations to 
parts of the matrix that would best reflect the impact of replication 
strategies and CL. Probably a subset of the data intensive workloads.

Also want a workload targeting the row cache and key cache when everything is 
cached and when there is a realistic long tail of data not in the cache.

For every workload to really get the value you would like a graph for 
throughput and a graph for latency at some percentile with a data point per 
revision tested going back to the beginning as well as a 90 day graph. A trend 
line also helps. Then someone has to be it for monitoring the graphs and poking 
people when there is an issue.

The workflow usually goes something like the monitor tags the author of the 
suspected bad revision who triages it and either fixes it or hands it off to 
the correct person. Timeliness is really important because once regressions 
start stacking it's a pain to know whether you have done what you should to fix 
it.


was (Author: aweisberg):
I think there are two general classes of benchmarks you would run in CI. 
Representative user workloads, and targeted microbenchmark workloads. Targeted 
workloads are a huge help during ongoing development because they magnify the 
impact of regressions from code changes that are harder to notice in 
representative workloads. They also point to the specific subsystem being 
benchmarked.

I will just cover the microbenchmarks. The full matrix is large so there is an 
element of wanting ponies, but the reality is that they are all interesting 
from a preventing performance regressions and understanding the impact of 
ongoing changes perspective.

Benchmark the stress client, so excess server capacity and a single client 
testing lots of small messages. Lots of large messages. Stuff the servers can 
answer as fast as possible. The flip side of this workload is the same thing 
but for the server where you measure how many trivially answerable tiny queries 
you can shove through a cluster given excess client capacity.

Benchmark perfomance of non-prepared statements.

Benchmark performance of preparing statements?
 
A full test matrix for data intensive workloads would test read, write, and 
50/50, and for a bonus 90/10. Single cell partitions with a small value and a 
large value, and a range of wide rows (small, medium, large). All 3 compaction 
strategies with compression on/off. Data intensive workloads also need to run 
against a spinning rust and SSDs.

CQL specific microbenchmarks against specific CQL datatypes. If there are 
interactions that are important we should capture those.

Counters

Lightweight transactions

The matrix also needs to include different permutations of replication 
strategies and consistency levels. Maybe we can constrain those variations to 
parts of the matrix that would best reflect the impact of replication 
strategies and CL. Probably a subset of the data intensive workloads.

Also want a workload targeting the row cache and key cache when everything is 
cached and when there is a realistic long tail of data not in the cache.

For every workload to really get the value you would like a graph for 
throughput and a graph for latency at some percentile with a data point per 
revision tested going back to the beginning as well as a 90 day graph. A trend 
line also helps. Then someone has to be it for monitoring the graphs and poking 
people when there is an issue.

The workflow usually goes something like the monitor tags the author of the 
suspected bad revision who triages it and either fixes it or hands it off to 
the correct person. Timeliness is really important because once regressions 
start stacking it's a pain to know whether you have done what you should to fix 
it.

> Collect important stress profiles for regression analysis done by jenkins
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8503
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Ryan McGuire
>            Assignee: Ryan McGuire
>
> We have a weekly job setup on CassCI to run a performance benchmark against 
> the dev branches as well as the last stable releases.
> Here's an example:
> http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f
> This test is currently pretty basic, it's running on three nodes, with a the 
> default stress profile. We should crowdsource a collection of stress profiles 
> to run, and then once we have many of these tests running we can collect them 
> all into a weekly email.
> Ideas:
>  * Timeseries (Can this be done with stress? not sure)
>  * compact storage
>  * compression off
>  * ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to