[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

zaycev Fri, 02 Mar 2018 16:33:00 -0800

Github user zaycev commented on the issue:

    https://github.com/apache/spark/pull/16578
  
    I observed about 5x better performance in reading a small subset of fields 
of a highly nested parquet table:
    
    master:
    <img width="1121" alt="screen shot 2018-03-02 at 1 59 39 pm" 
src="https://user-images.githubusercontent.com/283938/36928047-e07e5b52-1e36-11e8-98e4-a614ad7589b6.png";>
    <img width="403" alt="screen shot 2018-03-02 at 1 59 19 pm" 
src="https://user-images.githubusercontent.com/283938/36928033-c9a21022-1e36-11e8-81bf-7008e1f40d6f.png";>
    
    master with @mallman patch:
    <img width="1033" alt="screen shot 2018-03-02 at 2 58 42 pm" 
src="https://user-images.githubusercontent.com/283938/36928037-cdc9ec10-1e36-11e8-8830-5e77c074e4ab.png";>
    <img width="388" alt="screen shot 2018-03-02 at 2 59 09 pm" 
src="https://user-images.githubusercontent.com/283938/36928048-e3e15a88-1e36-11e8-8dda-9b384c4a04c8.png";>




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

Reply via email to