Github user squito commented on the issue:
https://github.com/apache/spark/pull/21698
@tgravescs its not guaranteed to reproduce with that. IIUC, you need to do
a repartition in the same stage that also does a shuffle-read, then have a
fetch failure, and on recompute that stage needs to fetch shuffle data in a
different order. I think you probably need to make sure the fetches are remote
to get a different order on a retry (local shuffle reads are deterministic, I
think).
@jiangxb1987 has worked on this stopped? I think there are still ideas for
how to go forward on this, and its a really important fix.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]