Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/21698
@squito @tgravescs I am probably missing something about why hash
partitioner helps, can you please clarify ?
IIRC the partitioner for CoalescedRDD when shuffle is enabled is
HashPartitioner ... the issue is the `distributePartition` before the shuffle
which is order sensitive but is not deterministic since its input is not
deterministic if it is derived from one or more shuffle output's.
Btw, when shuffle = false, it does not suffer from the problem - mentally I
had assumed that had an issue too - on a recheck now, I find it interesting
that it does not (I never used that, so had never checked in detail !)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]