Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20393
Actually the similar approach cannot apply to fix RDD.repartition(), as in
RDD[T], the data type `T` can be non-comparable, so we are not able to perform
a local sort before actually repartition.
I'm stepping back investigating other approaches that requires some
refactoring on the Core module but I donât think that it is safe to ship the
approach together with Spark 2.3
So my propose is, letâs include this PR in Spark 2.3, and target the
follow up work to 2.4. Especially since the RDD.repartition() issue is not a
regression of the latest version.
WDYT? @shivaram @sameeragarwal @rxin @mridulm
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]