ulysses-you commented on pull request #34820:
URL: https://github.com/apache/spark/pull/34820#issuecomment-987528689
@maryannxue `CoalescedPartitionSpec` only cares about the reduce partition
so the max length need to sort is the shuffle partition number, so I think the
added complexity is no harm.
To be conservative, I do a benchmark using the added complexity with 100000
shuffle partitions:
```scala
val shufflePartitions = 100000
val rand = new Random(0)
val bytesByPartitionId = Seq.tabulate(shufflePartitions)(i =>
rand.nextLong().abs).toArray
var previous = 0
var next = 0
val partitions = new ArrayBuffer[(Int, Int)]()
while (next < shufflePartitions) {
next = next + rand.nextInt(100)
partitions.append((previous, next.min(shufflePartitions - 1)))
previous = next
}
val start = System.nanoTime()
partitions.foreach { case (start, end) =>
(start until end)
.map(index => (index, bytesByPartitionId(index)))
.sortBy(_._2)(implicitly[Ordering[Long]].reverse)
}
println((System.nanoTime() - start) / 1000000)
```
It takes 73ms.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]