[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

a-roberts Thu, 07 Jul 2016 06:51:06 -0700

Github user a-roberts commented on the issue:

    https://github.com/apache/spark/pull/11956
  
    @robbinspg and I are evaluating this from a functional and performance 
perspective, full disclosure: we both work for IBM with @kiszk.
    
    All unit tests pass including the new ones Ishizaki has added, we've tested 
this on a variety of platforms, both big and little-endian. This is with IBM 
Java 8 and tested on three different architectures.
    
    We can run the benchmark with
    ```
    bin/spark-submit --class org.apache.spark.sql.DataFrameCacheBenchmark 
sql/core/target/spark-sql_2.11-2.0.0-tests.jar
    ``` 
    
    or can be run against branch-2.0 (Spark 2.0.1 snapshot) with 
    ```
    bin/spark-submit --class org.apache.spark.sql.DataFrameCacheBenchmark 
sql/core/target/spark-sql_2.11-2.0.1-SNAPSHOT-tests.jar
    ```
    
    Performance results on a few low powered testing systems are promising.
    
    Linux on Intel: 5.3x increase
    ```
      Stopped after 15 iterations, 2127 ms
    
    IBM J9 VM pxa6480sr3-20160428_01 (SR3) on Linux 3.13.0-65-generic
    Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
    Float Sum with PassThrough cache:        Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    InternalRow codegen                            669 /  829         47.1      
    21.3       1.0X
    ColumnVector codegen                           127 /  142        248.2      
     4.0       5.3X
    ```
    
    Linux on Z: 2.7x increase
    ```
    Stopped after 5 iterations, 2068 ms
    
    IBM J9 VM pxz6480sr3-20160428_01 (SR3) on Linux 3.12.43-52.6-default
    16/07/07 09:48:15 ERROR Utils: Process List(/usr/bin/grep, -m, 1, model 
name, /proc/cpuinfo) exited with code 1:
    Unknown processor
    Float Sum with PassThrough cache:        Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    InternalRow codegen                            997 / 1134         31.5      
    31.7       1.0X
    ColumnVector codegen                           371 /  414         84.7      
    11.8       2.7X
    
    ```
    
    Linux on Power: 6.4x increase
    ```
      Stopped after 7 iterations, 2099 ms
    
    IBM J9 VM pxl6480sr3-20160428_01 (SR3) on Linux 3.13.0-61-generic
    16/07/07 14:33:40 ERROR Utils: Process List(/bin/grep, -m, 1, model name, 
/proc/cpuinfo) exited with code 1:
    Unknown processor
    Float Sum with PassThrough cache:        Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    InternalRow codegen                           1199 / 1212         26.2      
    38.1       1.0X
    ColumnVector codegen                           186 /  300        168.8      
     5.9       6.4X
    ```
    
    So the performance increase and functionality is solid across platforms, 
Ishizaki has tested this with OpenJDK 8 also.
    
    One improvement would be add a scale factor parameter so we can use more 
data than:
    ```
        doubleSumBenchmark(1024 * 1024 * 15)
        floatSumBenchmark(1024 * 1024 * 30)
    ```
    and with no parameter we'd use the above as a standard/baseline. 
    
    Would also be useful to have the master url as a parameter so we can easily 
run this using many machines or with more cores to see the 
performance/functional impact when we scale (exercising various JIT levels for 
example)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

Reply via email to