Github user CodingCat commented on the issue:
https://github.com/apache/spark/pull/19810
Hi, @cloud-fan, this PR is not only for the case where the data size is
larger than the memory size, even when all data is in-memory, I observed up to
10-40% speedup because the implementation here
(1) read less data
(2) started less tasks
you can understand this PR as it implement the functionality of Parquet's
footer for the in-memory table
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]