Re: [SparkScore] Performance portal for Apache Spark

Sandy Ryza Wed, 17 Jun 2015 17:51:53 -0700

This looks really awesome.

On Tue, Jun 16, 2015 at 10:27 AM, Huang, Jie <jie.hu...@intel.com> wrote:


>  Hi All
>
> We are happy to announce Performance portal for Apache Spark
> http://01org.github.io/sparkscore/ !
>
> The Performance Portal for Apache Spark provides performance data on the
> Spark upsteam to the community to help identify issues, better understand
> performance differentials between versions, and help Spark customers get
> across the finish line faster. The Performance Portal generates two
> reports, regular (weekly) report and release based regression test report.
> We are currently using two benchmark suites which include HiBench (
> http://github.com/intel-bigdata/HiBench) and Spark-perf (
> https://github.com/databricks/spark-perf ). We welcome and look forward
> to your suggestions and feedbacks. More information and details provided
> below
> Abount Performance Portal for Apache Spark
>
> Our goal is to work with the Apache Spark community to further enhance the
> scalability and reliability of the Apache Spark. The data available on this
> site allows community members and potential Spark customers to closely
> track performance trend of the Apache Spark. Ultimately, we hope that this
> project will help community to fix performance issue quickly, thus
> providing better Apache spark code to end customers. The current workloads
> used in the benchmarking include HiBench (a benchmark suite to evaluate big
> data framework like Hadoop MR, Spark from Intel) and Spark-perf (a
> performance testing framework for Apache Spark from Databricks). Additional
> benchmarks will be added as they become available
> Description
> ------------------------------
>
> Each data point represents each workload runtime percent compared with the
> previous week. Different lines represents different workloads running on
> spark yarn-client mode.
> Hardware
> ------------------------------
>
> CPU type: Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
> Memory: 128GB
> NIC: 10GbE
> Disk(s): 8 x 1TB SATA HDD
> Software
> ------------------------------
>
> JAVA ver sion: 1.8.0_25
> Hadoop version: 2.5.0-CDH5.3.2
> HiBench version: 4.0
> Spark on yarn-client mode
> Cluster
> ------------------------------
>
> 1 node for Master
> 10 nodes for Slave
> Summary
>
> The lower percent the better performance.
>  ------------------------------
>
> *Group*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> HiBench
>
> 9.1%
>
> 6.6%
>
> 6.0%
>
> 7.9%
>
> -6.5%
>
> -3.1%
>
> spark-perf
>
> 4.1%
>
> 4.4%
>
> -1.8%
>
> 4.1%
>
> -4.7%
>
> -4.6%
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> HiBench
> ------------------------------
>
> *JOB*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> *commit*
>
> *489700c8 *
>
> *8e3822a0 *
>
> *530efe3e *
>
> *90c60692 *
>
> *db81b9d8 *
>
> *4eb48ed1 *
>
> sleep
>
> %
>
> %
>
> -2.1%
>
> -2.9%
>
> -4.1%
>
> 12.8%
>
> wordcount
>
> 17.6%
>
> 11.4%
>
> 8.0%
>
> 8.3%
>
> -18.6%
>
> -10.9%
>
> kmeans
>
> 92.1%
>
> 61.5%
>
> 72.1%
>
> 92.9%
>
> 86.9%
>
> 95.8%
>
> scan
>
> -4.9%
>
> -7.2%
>
> %
>
> -1.1%
>
> -25.5%
>
> -21.0%
>
> bayes
>
> -24.3%
>
> -20.1%
>
> -18.3%
>
> -11.1%
>
> -29.7%
>
> -31.3%
>
> aggregation
>
> 5.6%
>
> 10.5%
>
> %
>
> 9.2%
>
> -15.3%
>
> -15.0%
>
> join
>
> 4.5%
>
> 1.2%
>
> %
>
> 1.0%
>
> -12.7%
>
> -13.9%
>
> sort
>
> -3.3%
>
> -0.5%
>
> -11.9%
>
> -12.5%
>
> -17.5%
>
> -17.3%
>
> pagerank
>
> 2.2%
>
> 3.2%
>
> 4.0%
>
> 2.9%
>
> -11.4%
>
> -13.0%
>
> terasort
>
> -7.1%
>
> -0.2%
>
> -9.5%
>
> -7.3%
>
> -16.7%
>
> -17.0%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> spark-perf
> ------------------------------
>
> *JOB*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> *commit*
>
> *489700c8 *
>
> *8e3822a0 *
>
> *530efe3e *
>
> *90c60692 *
>
> *db81b9d8 *
>
> *4eb48ed1 *
>
> agg
>
> 13.2%
>
> 7.0%
>
> %
>
> 18.3%
>
> 5.2%
>
> 2.5%
>
> agg-int
>
> 16.4%
>
> 21.2%
>
> %
>
> 9.6%
>
> 4.0%
>
> 8.2%
>
> agg-naive
>
> 4.3%
>
> -2.4%
>
> %
>
> -0.8%
>
> -6.7%
>
> -6.8 %
>
> scheduling
>
> -6.1%
>
> -8.9%
>
> -14.5%
>
> -2.1%
>
> -6.4%
>
> -6.5%
>
> count-filter
>
> 4.1%
>
> 1.0%
>
> 6.6%
>
> 6.8%
>
> -10.2%
>
> -10.4%
>
> count
>
> 4.8%
>
> 4.6%
>
> 6.7%
>
> 8.0%
>
> -7.3%
>
> -7.0%
>
> sort
>
> -8.1%
>
> -2.5%
>
> -6.2%
>
> -7.0%
>
> -14.6%
>
> -14.4%
>
> sort-int
>
> 4.5%
>
> 15.3%
>
> -1.6%
>
> -0.1%
>
> -1.5%
>
> -2.2%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The pe rformance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
>  Release
> Summary
>
> The lower percent the better performance.
>  ------------------------------
>
> *Group*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> HiBench
>
> -1.0%
>
> 10.5%
>
> 8.4%
>
> 8.6%
>
> spark-perf
>
> 3.2%
>
> 0.9%
>
> 1.9%
>
> 1.3%
>
>
> *Y-Axis: normalized completion time; X-Axis: Release.*
> * The performance score for each workload is normalized based on the
> elapsed time for 1.2 release.The lower the better.*
> HiBench
> ------------------------------
>
> *JOB*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> sleep
>
> %
>
> %
>
> %
>
> -0.5%
>
> wordcount
>
> 3.5%
>
> 5.4%
>
> 5.1%
>
> 8.7%
>
> kmeans
>
> 6.0%
>
> 72.6%
>
> 82.7%
>
> 100.7%
>
> scan
>
> -0.7%
>
> -3.2%
>
> -1.9%
>
> -4.4%
>
> bayes
>
> -19.7%
>
> 7.7%
>
> -24.5%
>
> -14.4%
>
> aggregation
>
> 4.6%
>
> 7.1%
>
> 9.9%
>
> 9.3%
>
> join
>
> 0.7%
>
> 4.0%
>
> 8.6%
>
> 1.3%
>
> sort
>
> -1.0%
>
> 2.1%
>
> -1.8%
>
> -10.4%
>
> pagerank
>
> 1.5 %
>
> 2.2%
>
> 1.3%
>
> 5.4%
>
> terasort
>
> -3.7%
>
> -3.3%
>
> -3.7%
>
> -9.5%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Release. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> spark-perf
> ------------------------------
>
> *JOB*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> agg
>
> 1.9%
>
> 3.1%
>
> 6.2%
>
> 5.0%
>
> agg-int
>
> 6.4%
>
> 17.1%
>
> 18.0%
>
> 24.2%
>
> agg-naive
>
> -2.6%
>
> -3.2%
>
> -1.8%
>
> -5.2%
>
> scheduling
>
> 8.2%
>
> -16.8%
>
> -14.4%
>
> -19.1%
>
> count-filter
>
> -0.4%
>
> 0.3%
>
> -0.5%
>
> 0.4%
>
> count
>
> 0.6%
>
> -0.3%
>
> 0.4%
>
> 0.9%
>
> sort
>
> 1.2%
>
> -3.3%
>
> -5.3%
>
> -1.9%
>
> sort-int
>
> 10.1%
>
> 10.0%
>
> 12.3%
>
> 6.0%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Release. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
>  ------------------------------
>
> Copyright © 2015 Intel Corporation. All rights reserved. * *Other names
> and brands may be claimed as the property of others.*
> * Project Email: sparksc...@lists.01.org <sparksc...@lists.01.org> Please
> subscribe to the list at: https://lists.01.org/mailman/listinfo/sparkscore
> <https://lists.01.org/mailman/listinfo/sparkscore>*
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: [SparkScore] Performance portal for Apache Spark

Reply via email to