Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21859
I don't think this optimization should be done at SQL layer. The
`ShuffleWriter` should treat `RangePartitioner` specially and consume the
sampled data in `RangePartitioner` instead of the input iterator.
By doing that the SQL layer(as well as all other components) can benefit
from it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]