[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

MaxGekk Tue, 31 Jul 2018 09:28:29 -0700

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21909
  
    > does this benchmark result vary if we select col2 or col10?
    
    @felixcheung Not so much. Here is the benchmark for CSV.
    ```
    JJava HotSpot(TM) 64-Bit Server VM 1.8.0_172-b11 on Mac OS X 10.13.6
    Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
    
    Count a dataset with 10 columns:         Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    col0 + count()                                9097 / 9167          1.1      
   909.7       1.0X
    col2 + count()                                9294 / 9302          1.1      
   929.4       1.0X
    col5 + count()                                9346 / 9394          1.1      
   934.6       1.0X
    col7 + count()                                9227 / 9231          1.1      
   922.7       1.0X
    col9 + count()                                9141 / 9233          1.1      
   914.1       1.0X
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

Reply via email to