Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21698
I took a quick look at the shuffle writer and feel it will be hard to
insert a sort there.
I have a simpler proposal for the fix. To trigger this bug, there must be a
shuffle before the `repartition`, queries like `sc.textFile(...).repartition`
has no problem.
We can add a flag (named `fromCoelesce`) in the `ShuffleRDD` to indicate if
it's produced by `RDD#coalesce`. In `DAGScheduler`, if we hit a `FetchFailure`,
fail the job if the shuffle is from `RDD#coalesce` and the previous stage is
also a shuffle map stage. We can provide a config to turn off this check, or
add an `RDD#repartitionBy` which uses hash partitioner instead of round-robin.
In the error message we should mention these 2 workarounds.
In the next release, we can implement the sort or the retry approach as a
better fix.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]