Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/11956 @robbinspg and I are evaluating this from a functional and performance perspective, full disclosure: we both work for IBM with @kiszk. All unit tests pass including the new ones Ishizaki has added, we've tested this on a variety of platforms, both big and little-endian. This is with IBM Java 8 and tested on three different architectures. We can run the benchmark with ``` bin/spark-submit --class org.apache.spark.sql.DataFrameCacheBenchmark sql/core/target/spark-sql_2.11-2.0.0-tests.jar ``` or can be run against branch-2.0 (Spark 2.0.1 snapshot) with ``` bin/spark-submit --class org.apache.spark.sql.DataFrameCacheBenchmark sql/core/target/spark-sql_2.11-2.0.1-SNAPSHOT-tests.jar ``` Performance results on a few low powered testing systems are promising. Linux on Intel: 5.3x increase ``` Stopped after 15 iterations, 2127 ms IBM J9 VM pxa6480sr3-20160428_01 (SR3) on Linux 3.13.0-65-generic Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz Float Sum with PassThrough cache: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ InternalRow codegen 669 / 829 47.1 21.3 1.0X ColumnVector codegen 127 / 142 248.2 4.0 5.3X ``` Linux on Z: 2.7x increase ``` Stopped after 5 iterations, 2068 ms IBM J9 VM pxz6480sr3-20160428_01 (SR3) on Linux 3.12.43-52.6-default 16/07/07 09:48:15 ERROR Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1: Unknown processor Float Sum with PassThrough cache: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ InternalRow codegen 997 / 1134 31.5 31.7 1.0X ColumnVector codegen 371 / 414 84.7 11.8 2.7X ``` Linux on Power: 6.4x increase ``` Stopped after 7 iterations, 2099 ms IBM J9 VM pxl6480sr3-20160428_01 (SR3) on Linux 3.13.0-61-generic 16/07/07 14:33:40 ERROR Utils: Process List(/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1: Unknown processor Float Sum with PassThrough cache: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ InternalRow codegen 1199 / 1212 26.2 38.1 1.0X ColumnVector codegen 186 / 300 168.8 5.9 6.4X ``` So the performance increase and functionality is solid across platforms, Ishizaki has tested this with OpenJDK 8 also. One improvement would be add a scale factor parameter so we can use more data than: ``` doubleSumBenchmark(1024 * 1024 * 15) floatSumBenchmark(1024 * 1024 * 30) ``` and with no parameter we'd use the above as a standard/baseline. Would also be useful to have the master url as a parameter so we can easily run this using many machines or with more cores to see the performance/functional impact when we scale (exercising various JIT levels for example)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org