Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20414 Hey I searched the `ExternalAppendOnlyMap` and here are the findings: The `ExternalAppendOnlyMap` claims it keeps the sorted content, but it actually uses a `HashComparator` that compare the elements by their hashes. Luckily, it sort the elements using TimSort which is stable, that means, even if there exists hash collisions, the output sequence should still be deterministic, as long as the inputs are (which we can achieve by modifying `ShuffleBlockFetcherIterator` per previous discussion). We may need to check for all the other places we may spill/compare objects to ensure we generate deterministic output sequence everywhere, though.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org