Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/21698
@cloud-fan The difference would be between a (user) defined record order
(global sort or local sort) and expectation of repeatable record order on
recomputation.
It might also be a good idea to explore how other frameworks handle this.
> However, the round robin partitione(following with a shuffle) violates it.
This is is not limited to repartition : any closure which depends on input
order has the same effect - repartition/coalesce is one instance of this issue
- I gave a few examples from spark itself; and I am sure there are other
examples from spark and user code.
It is possible this issue was initially identified via repartition - but
modeling the solution only for one manifestation of the issue ignores all
others and leaves them unfixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]