rednaxelafx commented on a change in pull request #23731: [SPARK-26572][SQL] 
fix aggregate codegen result evaluation
URL: https://github.com/apache/spark/pull/23731#discussion_r256145599
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
 ##########
 @@ -2110,4 +2112,28 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
       checkAnswer(res, Row("1-1", 6, 6))
     }
   }
+
+  test("SPARK-26572: fix aggregate codegen result evaluation") {
+    val baseTable = Seq((1), (1)).toDF("idx")
+
+    // BroadcastHashJoinExec with a HashAggregateExec child containing no 
aggregate expressions
+    val distinctWithId = baseTable.distinct().withColumn("id", 
monotonically_increasing_id())
 
 Review comment:
   I'm not sure how stable the results are going to be if you use 
`monotonically_increasing_id` here with an unspecified number of shuffle 
partitions. Since you're checking the exact value of the resulting id, if the 
number of shuffle partitions changes (let's say if someone decides to change 
the default shuffle partitions setting in all tests), this test can become 
fragile and fail unnecessarily.
   
   It might be worth setting the shuffle partition to 1 explicitly inside this 
test case. Or go back to grouping by `id` instead of checking the exact value 
of `id`, or just assert the `id`s are equal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to