Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20414
Hey I searched the `ExternalAppendOnlyMap` and here are the findings:
The `ExternalAppendOnlyMap` claims it keeps the sorted content, but it
actually uses a `HashComparator` that compare the elements by their hashes.
Luckily, it sort the elements using TimSort which is stable, that means, even
if there exists hash collisions, the output sequence should still be
deterministic, as long as the inputs are (which we can achieve by modifying
`ShuffleBlockFetcherIterator` per previous discussion).
We may need to check for all the other places we may spill/compare objects
to ensure we generate deterministic output sequence everywhere, though.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]