Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @cloud-fan The difference would be between a (user) defined record order 
(global sort or local sort) and expectation of repeatable record order on 
recomputation.
    It might also be a good idea to explore how other frameworks handle this.
    
    > However, the round robin partitione(following with a shuffle) violates it.
    
    This is is not limited to repartition : any closure which depends on input 
order has the same effect - repartition/coalesce is one instance of this issue 
- I gave a few examples from spark itself; and I am sure there are other 
examples from spark and user code.
    
    It is possible this issue was initially identified via repartition - but 
modeling the solution only for one manifestation of the issue ignores all 
others and leaves them unfixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to