Github user mridulm commented on the issue: https://github.com/apache/spark/pull/22112 @tgravescs Please see https://github.com/apache/spark/pull/22112#discussion_r210788359 for a further elaboration. We actually cannot support random order (except for small subset of cases like map-only jobs for example). Ideally, I would like to see order sensitive closure's fixed - and fixing repartition + shuffle would fix this for a general case for all order sensitive closures. This PR is not fixing the problem, but rather failing and re-trying the job as a workaround - which, as you mention, can be terribly expensive for large jobs. Ofcourse, data correctness trump's performance, so I am fine with this as stop-gap. I would expect most non trivial application's will simply workaround this by checkpoint'ing to hdfs like what we did in YST.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org