[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

dongjoon-hyun Wed, 13 Jun 2018 23:52:06 -0700

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    @gatorsmile and @maropu . I really appreciate this effort. Thanks.
    
    Since this is a cloud benchmark, I have one thing to recommend. Can we use 
`r3.xlarge` for all benchmarks **consistently**? As we know, it's difficult to 
compare the result from different machines.
    
    There are three reasons.
    
    1. `r3.xlarge` is cheaper than `m4.2xlarge`.
    2. Previous benchmark result cames from Macbook (SSD). `r3.xlarge` also 
provides SSD.
    3. `r3.xlarge` is used at [Databricks TPCDS 
benchmark](https://databricks.com/blog/2017/07/12/benchmarking-big-data-sql-platforms-in-the-cloud.html),
 too.
    
    The following is the result on `r3.xlarge`; I launched the machine and 
build this PR on the latest master and run `bin/spark-submit --master local[1] 
--driver-memory 10G --conf spark.ui.enabled=false --class 
org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark 
sql/core/target/scala-2.11/spark-sql_2.11-2.0-SNAPSHOT-tests.jar`. (There is no 
hadoop installation. I guess @maropu also does.)
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9133 / 9275          1.7      
   580.6       1.0X
    Parquet Vectorized (Pushdown)                   85 /  100        185.2      
     5.4     107.6X
    Native ORC Vectorized                         8760 / 8843          1.8      
   556.9       1.0X
    Native ORC Vectorized (Pushdown)               115 /  130        136.4      
     7.3      79.2X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9254 / 9276          1.7      
   588.4       1.0X
    Parquet Vectorized (Pushdown)                  912 /  922         17.2      
    58.0      10.1X
    Native ORC Vectorized                         8966 / 9013          1.8      
   570.1       1.0X
    Native ORC Vectorized (Pushdown)               254 /  276         61.8      
    16.2      36.4X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9106 / 9136          1.7      
   578.9       1.0X
    Parquet Vectorized (Pushdown)                  897 /  910         17.5      
    57.0      10.2X
    Native ORC Vectorized                         8846 / 8889          1.8      
   562.4       1.0X
    Native ORC Vectorized (Pushdown)               254 /  267         61.9      
    16.2      35.8X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 1 string row (value <=> '7864320'): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9095 / 9124          1.7      
   578.3       1.0X
    Parquet Vectorized (Pushdown)                  891 /  899         17.7      
    56.6      10.2X
    Native ORC Vectorized                         8853 / 8941          1.8      
   562.8       1.0X
    Native ORC Vectorized (Pushdown)               246 /  254         64.0      
    15.6      37.0X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 1 string row ('7864320' <= value <= '7864320'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9236 / 9273          1.7      
   587.2       1.0X
    Parquet Vectorized (Pushdown)                  902 /  910         17.4      
    57.4      10.2X
    Native ORC Vectorized                         8944 / 8965          1.8      
   568.6       1.0X
    Native ORC Vectorized (Pushdown)               248 /  262         63.4      
    15.8      37.2X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select all string rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)  
 Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                          20309 / 20381          0.8      
  1291.2       1.0X
    Parquet Vectorized (Pushdown)               20437 / 20477          0.8      
  1299.3       1.0X
    Native ORC Vectorized                       24929 / 24999          0.6      
  1585.0       0.8X
    Native ORC Vectorized (Pushdown)            24918 / 25040          0.6      
  1584.3       0.8X
    ```
    
    As you see, the result is more consistent from the previous one and is 
different from this PR. Actually, I was reluctant to say this, but we had 
better have a standard way to generate a benchmark result on the cloud. If 
possible, I'd like to use `r3.xlarge`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Reply via email to