Nevermind, a look @ the ExternalSorter class tells me that the iterator for
each key that's only partially ordered ends up being merge sorted by
equality after the fact. Wanted to post my finding on here for others who
may have the same questions.
On Tue, Mar 1, 2016 at 3:05 PM, Corey Nolet wrot
The reason I'm asking, I see a comment in the ExternalSorter class that
says this:
"If we need to aggregate by key, we either use a total ordering from the
ordering parameter, or read the keys with the same hash code and compare
them with each other for equality to merge values".
How can this be
So if I'm using reduceByKey() with a HashPartitioner, I understand that the
hashCode() of my key is used to create the underlying shuffle files.
Is anything other than hashCode() used in the shuffle files when the data
is pulled into the reducers and run through the reduce function? The reason
I'm