Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @squito @tgravescs I am probably missing something about why hash 
partitioner helps, can you please clarify ?
    IIRC the partitioner for CoalescedRDD when shuffle is enabled is 
HashPartitioner ... the issue is the `distributePartition` before the shuffle 
which is order sensitive but is not deterministic since its input is not 
deterministic if it is derived from one or more shuffle output's.
    
    Btw, when shuffle = false, it does not suffer from the problem - mentally I 
had assumed that had an issue too - on a recheck now, I find it interesting 
that it does not (I never used that, so had never checked in detail !)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to