[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

rdblue Thu, 03 May 2018 09:17:12 -0700

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21070
  
    @maropu, are you sure about the INT and FLOAT columns? I think you might 
have that assessment backwards. Here's the INT results from the PR gist:
    
    ```
    SQL Single INT Column Scan:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    SQL Parquet Vectorized                         149 /  162        105.5      
     9.5       1.0X
    SQL Parquet MR                                1825 / 1836          8.6      
   116.1       0.1X
    ```
    
    And here are the INT results from the master gist:
    
    ```
    SQL Single INT Column Scan:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    SQL Parquet Vectorized                         250 /  292         63.0      
    15.9       1.0X
    SQL Parquet MR                                3175 / 3202          5.0      
   201.8       0.1X
    ```
    
    I think that shows that the PR result was significantly faster, not slower. 
(The other INT test was about the same.)
    
    Here's the FLOAT column from the PR gist:
    
    ```
    SQL Single FLOAT Column Scan:            Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    SQL Parquet Vectorized                         145 /  158        108.8      
     9.2       1.0X
    SQL Parquet MR                                1840 / 1843          8.5      
   117.0       0.1X
    ```
    
    And FLOAT from the master gist:
    
    ```
    SQL Single FLOAT Column Scan:            Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    SQL Parquet Vectorized                         261 /  316         60.2      
    16.6       1.0X
    SQL Parquet MR                                3267 / 3284          4.8      
   207.7       0.1X
    ```
    
    Am I reading this incorrectly? I'm considering lower time values and higher 
rate values to be better.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

Reply via email to