Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/21698
> I guess on the RDD side its not called RoundRobinPartitioner
Thanks for clarifying @tgravescs ! I was looking at `RangePartitioner` and
variants and was wondering what I was missing - did not make the obvious
connection with sql :-)
> If we can't come up with another solution, I would actually be ok with
failing short term, its better then corruption
If I understand correctly, the proposal is
* In `ShuffledRDD`, add a flag `orderSensitiveReducer` (?) - to track
specific patterns identified which is order sensitive (repartition with shuffle
= true, zip, etc).
* If a task is getting re-executed as part of stage re-execution, if the
flag is true, fail job.
* Task re-execution as part of same stage, speculative execution, etc
should not be an issue - since only one task completes.
* ResultStage should not be affected.
* I am unsure about how cache'ing data interacts here - might need some
investigation.
This looks like a reasonable stop gap until we fix the issue.
It also allows for users to make progress by inserting a checkpoint before
the order sensitive closure to unblock them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]