Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    @tgravescs Please see 
https://github.com/apache/spark/pull/22112#discussion_r210788359 for a further 
elaboration. We actually cannot support random order (except for small subset 
of cases like map-only jobs for example).
    Ideally, I would like to see order sensitive closure's fixed - and fixing 
repartition + shuffle would fix this for a general case for all order sensitive 
closures.
    This PR is not fixing the problem, but rather failing and re-trying the job 
as a workaround - which, as you mention, can be terribly expensive for large 
jobs. Ofcourse, data correctness trump's performance, so I am fine with this as 
stop-gap. I would expect most non trivial application's will simply workaround 
this by checkpoint'ing to hdfs like what we did in YST.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to