Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    I noticed why the big performance value changes happened in 
https://github.com/apache/spark/pull/21288#discussion_r191280132; that's 
because [the 
commit](./https://github.com/apache/spark/pull/21288/commits/39e5a507fe22cade6bed0613eefbccab15cf45ff)
 wrongly set `local[*]` at `spark.master` instead of `local[1]`;
    
    ```
    // Performance results on r3.xlarge 
    
    // --master local[1] --driver-memory 10G --conf spark.ui.enabled=false
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9292 / 9315          1.7      
   590.8       1.0X
    Parquet Vectorized (Pushdown)                  921 /  933         17.1      
    58.6      10.1X
    Native ORC Vectorized                         9001 / 9021          1.7      
   572.3       1.0X
    Native ORC Vectorized (Pushdown)               257 /  265         61.2      
    16.3      36.2X
    
    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9151 / 9162          1.7      
   581.8       1.0X
    Parquet Vectorized (Pushdown)                  902 /  917         17.4      
    57.3      10.1X
    Native ORC Vectorized                         8870 / 8882          1.8      
   564.0       1.0X
    Native ORC Vectorized (Pushdown)               254 /  268         61.9      
    16.1      36.0X
    ...
    
    
    // --master local[*] --driver-memory 10G --conf spark.ui.enabled=false
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            3959 / 4067          4.0      
   251.7       1.0X
    Parquet Vectorized (Pushdown)                  202 /  245         77.7      
    12.9      19.6X
    Native ORC Vectorized                         3973 / 4055          4.0      
   252.6       1.0X
    Native ORC Vectorized (Pushdown)               286 /  345         55.0      
    18.2      13.8X
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            3985 / 4022          3.9      
   253.4       1.0X
    Parquet Vectorized (Pushdown)                  249 /  274         63.3      
    15.8      16.0X
    Native ORC Vectorized                         4066 / 4122          3.9      
   258.5       1.0X
    Native ORC Vectorized (Pushdown)               257 /  310         61.3      
    16.3      15.5X
    ```
    
    I'll fix the bug and update the results in following prs. Sorry, all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to