[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

mridulm Thu, 16 Aug 2018 14:53:10 -0700

Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    @tgravescs I was specifically in agreement with
    > Personally I don't want to talk about implementation until we decide what 
we want our semantics to be around the unordered operations because that 
affects any implementation.
    
    and
    
    > I would propose we fix the things that are using the round robin type 
partitioning (repartition) but then unordered things like zip/MapPartitions 
(via user code) we document or perhaps give the user the option to sort.
    
    IMO a fix in spark core for repartition should work for most (if not all) 
order dependent closures - we might choose not to implement for others due to 
time constraints; but basic idea should be fairly similar.
    Given this, I am fine with documenting the potential issue for others and 
fix for a core subset - with assumption that we will expand solution to cover 
all later.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to