Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/21698
@tgravescs I vaguely remember someone at y! labs telling me (more than a
decade back) about MR always doing a sort as part of its shuffle to avoid a
variant of this problem by design.
Essentially it boils down to Imran's suggestion even for arbitrary byte
writable's [1], [2] ...
[1]
https://hadoop.apache.org/docs/r0.23.11/api/src-html/org/apache/hadoop/io/BytesWritable.html
[2]
https://hadoop.apache.org/docs/r0.23.11/api/src-html/org/apache/hadoop/io/WritableComparator.html#line.154
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]