Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22112#discussion_r212383406
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -305,17 +306,19 @@ object ShuffleExchangeExec {
rdd
}
+ // round-robin function is order sensitive if we don't sort the
input.
+ val orderSensitiveFunc = isRoundRobin &&
!SQLConf.get.sortBeforeRepartition
if (needToCopyObjectsBeforeShuffle(part)) {
- newRdd.mapPartitionsInternal { iter =>
+ newRdd.mapPartitionsWithIndexInternal((_, iter) => {
--- End diff --
Shouldn't we mark `newRdd` as `IDEMPOTENT` if insert a local sort (or
`INDETERMINATE` if don't sort), so we don't have to mark the function as order
sensitive?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]