[GitHub] [spark] EnricoMi commented on pull request #26969: [SPARK-30319][SQL] Add a stricter version of `as[T]`

GitBox Tue, 18 Aug 2020 02:21:42 -0700


EnricoMi commented on pull request #26969:
URL: https://github.com/apache/spark/pull/26969#issuecomment-675366125



   Another reason to add this stricter version of `as[T]` is that it enables 
schema pushdown:
   
       spark.read.parquet(…).as[T]
   
   will read all columns even if `T` is only a subset of the parquet's schema. 
The schema of `T`s encoder is hidden and not available to the query optimizer. 
By using `toDS[T]`, the schema will become visible and schema pushdown becomes 
possible:
   
       spark.read.parquet(…).toDS[T]
   
   There is even a test that shows schema pushdown occurs with `toDS[T]`: 
`test("SPARK-30319: toDS enables schema pushdown")`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] EnricoMi commented on pull request #26969: [SPARK-30319][SQL] Add a stricter version of `as[T]`

Reply via email to