Github user sameeragarwal commented on the issue:
https://github.com/apache/spark/pull/20393
@mridulm one approach that Xingbo is looking into (independently of
https://github.com/apache/spark/pull/20414) is to have the
`ShuffleBlockFetcherIterator` remember the order of blocks it fetches and store
them in that order. Given that the blocks will still be fetched in parallel,
depending on the available buffer size, we'll then have to spill some
out-of-order blocks on disk in order to avoid OOMs on the receiver (similar to
https://github.com/apache/spark/pull/16989). While this would still regress
performance, it might be better than the current local sort based fix. Note
that I'm not arguing against the fact that hash partitioning would be the
"best" fix in terms of performance, but it'd then defeat the purpose of
repartition (due to skew).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]