[GitHub] [spark] cloud-fan commented on a diff in pull request #40915: [SPARK-43232][SQL] Improve ObjectHashAggregateExec performance for high cardinality

via GitHub Tue, 25 Apr 2023 01:26:00 -0700


cloud-fan commented on code in PR #40915:
URL: https://github.com/apache/spark/pull/40915#discussion_r1176179219



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala:
##########
@@ -292,19 +294,35 @@ class SortBasedAggregator(
         }
       }
 
+      /**
+       * This function has a side effect that updates `aggregateMode` to 
represent:
+       * 0: the grouping key belongs to input rows, and we should update it to 
aggregation buffer
+       * 1: the grouping key belongs to input aggregation buffer, and we 
should merge it to
+       *    aggregation buffer
+       * 2: the grouping key exists in both input rows and aggregation buffer, 
and we should first
+       *    update then merge it
+       */

Review Comment:
   let's move it to where the state variable is defined, and also explain how 
the state transition is triggered.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #40915: [SPARK-43232][SQL] Improve ObjectHashAggregateExec performance for high cardinality

Reply via email to