WangGuangxin opened a new pull request #28780: URL: https://github.com/apache/spark/pull/28780
### What changes were proposed in this pull request? It happends when hash aggregate downgrades to sort based aggregate. `UnsafeExternalSorter.createWithExistingInMemorySorter` calls `spill` on an `InMemorySorter` immediately, but the memory pointed by InMemorySorter is acquired by outside `BytesToBytesMap`, instead the `allocatedPages` in `UnsafeExternalSorter`. So the memory spill bytes metric is always 0, but disk bytes spill metric is right. Related code is at https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L232. It can be reproduced by following step. ``` bin/spark-shell --driver-memory 512m --executor-memory 512m --executor-cores 1 --conf "spark.default.parallelism=1" scala> sql("select id, count(1) from range(10000000) group by id").write.csv("/tmp/result.json") ``` Before this patch, the metric is  After this patch, the metric is  ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test manually ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
