EnricoMi commented on pull request #26969:
URL: https://github.com/apache/spark/pull/26969#issuecomment-675366125


   Another reason to add this stricter version of `as[T]` is that it enables 
schema pushdown:
   
       spark.read.parquet(…).as[T]
   
   will read all columns even if `T` is only a subset of the parquet's schema. 
The schema of `T`s encoder is hidden and not available to the query optimizer. 
By using `toDS[T]`, the schema will become visible and schema pushdown becomes 
possible:
   
       spark.read.parquet(…).toDS[T]
   
   There is even a test that shows schema pushdown occurs with `toDS[T]`: 
`test("SPARK-30319: toDS enables schema pushdown")`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to