Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20174#discussion_r160040256 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -666,4 +666,16 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext { assert(exchangePlans.length == 1) } } + + test("SPARK-22951: aggregation on empty data frame should only return initial values") { + // non code gen + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { + assert(spark.emptyDataFrame.dropDuplicates.count == 0) + } + + // code gen + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true") { + assert(spark.emptyDataFrame.dropDuplicates.count == 0) + } + } --- End diff -- ``` Seq("true", "false").foreach { codegen => withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> codegen) { assert(spark.emptyDataFrame.dropDuplicates.count == 0) } } ``` BTW, I think it is common patterns to check codegen and non-codegen paths, so we might be better to add a helper function in test utility class like; ``` checkExecution { assert(spark.emptyDataFrame.dropDuplicates.count == 0) } ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org