EnricoMi commented on pull request #26969:
URL: https://github.com/apache/spark/pull/26969#issuecomment-675366125
Another reason to add this stricter version of `as[T]` is that it enables
schema pushdown:
spark.read.parquet(…).as[T]
will read all columns even if `T` is only a subset of the parquet's schema.
The schema of `T`s encoder is hidden and not available to the query optimizer.
By using `toDS[T]`, the schema will become visible and schema pushdown becomes
possible:
spark.read.parquet(…).toDS[T]
There is even a test that shows schema pushdown occurs with `toDS[T]`:
`test("SPARK-30319: toDS enables schema pushdown")`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]