Hi Spark Experts, I am curious what people are using to benchmark their Spark clusters. We are about to start a build (bare metal) vs buy (AWS/Google Cloud/Qubole) project to determine our Hadoop and Spark deployment selection. On the Hadoop side we will test live workloads as well as simulated ones with frameworks like TestDFSIO, TeraSort, MRBench, GridMix, etc.
Do any equivalent benchmarking frameworks exist for Spark? A quick Google search yielded https://github.com/databricks/spark-perf which looks pretty interesting. It would be great to hear what others are doing here. Thanks for the help! Jonathan