c21 commented on a change in pull request #32242:
URL: https://github.com/apache/spark/pull/32242#discussion_r618655659
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##########
@@ -663,7 +663,7 @@ case class HashAggregateExec(
private def enableTwoLevelHashMap(ctx: CodegenContext): Unit = {
if (!checkIfFastHashMapSupported(ctx)) {
- if (modes.forall(mode => mode == Partial || mode == PartialMerge) &&
!Utils.isTesting) {
Review comment:
@cloud-fan - I was wondering at first place before making this PR as
well. The decision to only support partial aggregate is made when the first
level hash map was introduced (https://github.com/apache/spark/pull/12345 and
https://github.com/apache/spark/pull/14176), and never changed afterwards. I
checked with @sameeragarwal before making this PR. He told me there is no
fundamental reason to not support final aggregate.
Just for documentation, I asked him why we don't support nested type
(array/map/struct) as key type for fast hash map. He told me the reason was the
size of keys might be too large for long array/map/struct, so the size of fast
hash map may not fit in cache and lose the benefit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]