Hello Carl, What you try to do sounds like a good match with one of the tool we open-sourced and actively maintain: https://github.com/thelastpickle/tlp-stress.
TLP Stress allows you to use defined profiles (see https://github.com/thelastpickle/tlp-stress/tree/master/src/main/kotlin/com/thelastpickle/tlpstress/profiles) or create your own profiles and/or schemas. Contributions are welcome. You can tune workloads, the read/write ratio, the number of distinct partitions, number of operations to run... You might need multiple client to maximize the throughput, depending on instances in use and your own testing goals. version specific stuff to 2.1, 2.2, 3.x, 4.x In case that might be of some use as well, we like to use it combined with another of our tools: TLP Cluster ( https://github.com/thelastpickle/tlp-cluster). We can the easily create and destroy Cassandra environments (on AWS) including Cassandra servers, client and monitoring (Prometheus). You can have a look anyway, I think both projects might be of interest to reach your goal. C*heers, ----------------------- Alain Rodriguez - [email protected] France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le jeu. 23 mai 2019 à 21:25, Carl Mueller <[email protected]> a écrit : > Does anyone have any schema / schema generation that can be used for > general testing that has lots of complicated aspects and data? > > For example, it has a bunch of different rk/ck variations, column data > types, altered /added columns and data (which can impact sstables and > compaction), > > Mischeivous data to prepopulate (such as > https://github.com/minimaxir/big-list-of-naughty-strings for strings, > ugly keys in maps, semi-evil column names) of sufficient size to get on > most nodes of a 3-5 node cluster > > superwide rows > large key values > > version specific stuff to 2.1, 2.2, 3.x, 4.x > > I'd be happy to centralize this in a github if this doesn't exist anywhere > yet > > >
