Github user zaycev commented on the issue: https://github.com/apache/spark/pull/16578 I observed about 5x better performance in reading a small subset of fields of a highly nested parquet table: master: <img width="1121" alt="screen shot 2018-03-02 at 1 59 39 pm" src="https://user-images.githubusercontent.com/283938/36928047-e07e5b52-1e36-11e8-98e4-a614ad7589b6.png"> <img width="403" alt="screen shot 2018-03-02 at 1 59 19 pm" src="https://user-images.githubusercontent.com/283938/36928033-c9a21022-1e36-11e8-81bf-7008e1f40d6f.png"> master with @mallman patch: <img width="1033" alt="screen shot 2018-03-02 at 2 58 42 pm" src="https://user-images.githubusercontent.com/283938/36928037-cdc9ec10-1e36-11e8-8830-5e77c074e4ab.png"> <img width="388" alt="screen shot 2018-03-02 at 2 59 09 pm" src="https://user-images.githubusercontent.com/283938/36928048-e3e15a88-1e36-11e8-8dda-9b384c4a04c8.png">
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org