c21 commented on a change in pull request #32242:
URL: https://github.com/apache/spark/pull/32242#discussion_r616473534
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##########
@@ -128,6 +128,16 @@ case class HashAggregateExec(
// all the mode of aggregate expressions
private val modes = aggregateExpressions.map(_.mode).distinct
+ // This is for testing final aggregate with number-of-rows-based fall back
as specified in
+ // `testFallbackStartsAt`. In this scenario, there might be same keys exist
in both fast and
+ // regular hash map. So the aggregation buffers from both maps need to be
merged together
+ // to avoid correctness issue.
+ //
+ // This scenario only happens in unit test with number-of-rows-based fall
back.
+ // There should not be same keys in both maps with size-based fall back in
production.
+ private val isTestFinalAggregateWithFallback: Boolean =
testFallbackStartsAt.isDefined &&
Review comment:
@cloud-fan - sorry, I overlooked your question, you are asking how
size-based fallback works.
Size-based fallback works as:
1. try to insert into 1st level hash map, and fallback to 2nd level hash map
when no space in the required memory page (`RowBasedKeyValueBatch `) -
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala#L165-L166
.
2. try to insert into 2nd level hash map, and fallback to sort-based when no
space in `UnsafeFixedWidthAggregationMap` -
https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java#L148-L150
.
3. the 2nd level hash map will be sorted and spilled and another new 2nd
level hash map will be created. The 1st level hash map cannot be spilled.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]