[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on an RDD/DataFra...

jiangxb1987 Fri, 26 Jan 2018 13:44:21 -0800

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20393
  
    Actually the similar approach cannot apply to fix RDD.repartition(), as in 
RDD[T], the data type `T` can be non-comparable, so we are not able to perform 
a local sort before actually repartition.
    
    I'm stepping back investigating other approaches that requires some 
refactoring on the Core module but I donât think that it is safe to ship the 
approach together with Spark 2.3
    
    So my propose is, letâs include this PR in Spark 2.3, and target the 
follow up work to 2.4. Especially since the RDD.repartition() issue is not a 
regression of the latest version.
    
    WDYT? @shivaram @sameeragarwal @rxin @mridulm



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on an RDD/DataFra...

Reply via email to