Hi Spark Experts,

I am curious what people are using to benchmark their Spark clusters.  We
are about to start a build (bare metal) vs buy (AWS/Google Cloud/Qubole)
project to determine our Hadoop and Spark deployment selection.  On the
Hadoop side we will test live workloads as well as simulated ones with
frameworks like TestDFSIO, TeraSort, MRBench, GridMix, etc.

Do any equivalent benchmarking frameworks exist for Spark?  A quick Google
search yielded https://github.com/databricks/spark-perf which looks pretty
interesting.  It would be great to hear what others are doing here.

Thanks for the help!

Jonathan

Reply via email to