[GitHub] [spark] c21 commented on a change in pull request #32242: [SPARK-35141][SQL] Support two level of hash maps for final hash aggregation

GitBox Thu, 22 Apr 2021 11:47:18 -0700


c21 commented on a change in pull request #32242:
URL: https://github.com/apache/spark/pull/32242#discussion_r618655659




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##########
@@ -663,7 +663,7 @@ case class HashAggregateExec(
 
   private def enableTwoLevelHashMap(ctx: CodegenContext): Unit = {
     if (!checkIfFastHashMapSupported(ctx)) {
-      if (modes.forall(mode => mode == Partial || mode == PartialMerge) && 
!Utils.isTesting) {

Review comment:
       @cloud-fan - I was wondering at first place before making this PR as 
well. The decision to only support partial aggregate is made when the first 
level hash map was introduced (https://github.com/apache/spark/pull/12345 and 
https://github.com/apache/spark/pull/14176), and never changed afterwards. I 
checked with @sameeragarwal before making this PR. He told me there is no 
fundamental reason to not support final aggregate.
   
   Just for documentation, I asked him why we don't support nested type 
(array/map/struct) as key type for fast hash map. He told me the reason was the 
size of keys might be too large for long array/map/struct, so the size of fast 
hash map may not fit in cache and lose the benefit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #32242: [SPARK-35141][SQL] Support two level of hash maps for final hash aggregation

Reply via email to