[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

tgravescs Thu, 16 Aug 2018 15:08:56 -0700

Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    Thanks for the clarification, but I guess my point is with your last 
statement:
    
    >  - with assumption that we will expand solution to cover all later.
    
    If we document this and say we support unordered operations with the caveat 
that failures could result in different results, my assumption is we don't 
necessarily have to do anything else ever (this is what I am proposing).  We 
could decide to for instance add an option to sort, or if its not a result 
stage fail more tasks to try handle the situation, but strictly speaking we 
wouldn't have to.
    
    If you think we have to fix those operations that can result in unordered 
then I think it comes back to we just don't support unordered operations at all 
and we should say that and probably force the sort on all these operations and 
possibly on all operations where user could cause it to be different order on 
rerun.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to