Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21698#discussion_r200220378
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -452,6 +452,10 @@ abstract class RDD[T: ClassTag](
/** Distributes elements evenly across output partitions, starting
from a random partition. */
val distributePartition = (index: Int, items: Iterator[T]) => {
var position = new
Random(hashing.byteswap32(index)).nextInt(numPartitions)
+ // TODO Enable insert a local sort before shuffle to make input
data sequence
--- End diff --
shall we remove the TODO? I feel it's almost impossible to do it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]