[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

mridulm Mon, 13 Aug 2018 20:36:20 -0700

Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @tgravescs I vaguely remember someone at y! labs telling me (more than a 
decade back) about MR always doing a sort as part of its shuffle to avoid a 
variant of this problem by design.
    Essentially it boils down to Imran's suggestion even for arbitrary byte 
writable's [1], [2] ... 
    
    [1] 
https://hadoop.apache.org/docs/r0.23.11/api/src-html/org/apache/hadoop/io/BytesWritable.html
    [2] 
https://hadoop.apache.org/docs/r0.23.11/api/src-html/org/apache/hadoop/io/WritableComparator.html#line.154



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to