rednaxelafx commented on a change in pull request #23731: [SPARK-26572][SQL]
fix aggregate codegen result evaluation
URL: https://github.com/apache/spark/pull/23731#discussion_r256145599
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##########
@@ -2110,4 +2112,28 @@ class DataFrameSuite extends QueryTest with
SharedSQLContext {
checkAnswer(res, Row("1-1", 6, 6))
}
}
+
+ test("SPARK-26572: fix aggregate codegen result evaluation") {
+ val baseTable = Seq((1), (1)).toDF("idx")
+
+ // BroadcastHashJoinExec with a HashAggregateExec child containing no
aggregate expressions
+ val distinctWithId = baseTable.distinct().withColumn("id",
monotonically_increasing_id())
Review comment:
I'm not sure how stable the results are going to be if you use
`monotonically_increasing_id` here with an unspecified number of shuffle
partitions. Since you're checking the exact value of the resulting id, if the
number of shuffle partitions changes (let's say if someone decides to change
the default shuffle partitions setting in all tests), this test can become
fragile and fail unnecessarily.
It might be worth setting the shuffle partition to 1 explicitly inside this
test case. Or go back to grouping by `id` instead of checking the exact value
of `id`, or just assert the `id`s are equal.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]