Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @tgravescs I vaguely remember someone at y! labs telling me (more than a 
decade back) about MR always doing a sort as part of its shuffle to avoid a 
variant of this problem by design.
    Essentially it boils down to Imran's suggestion even for arbitrary byte 
writable's [1], [2] ... 
    
    [1] 
https://hadoop.apache.org/docs/r0.23.11/api/src-html/org/apache/hadoop/io/BytesWritable.html
    [2] 
https://hadoop.apache.org/docs/r0.23.11/api/src-html/org/apache/hadoop/io/WritableComparator.html#line.154


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to