Github user CodingCat commented on the issue:
https://github.com/apache/spark/pull/19810
reading less data is a observation from the input metrics in Spark UI which
includes both of local/remote read in BlockManagers, and also the overhead in
BlockManager layer itself (especially when the user chooses to cache with
serialized format)
but I didn't count how much it contributes to the speedup (and a small
portion of data is in disk in my perf test)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]