Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20174#discussion_r160770992
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
    @@ -666,4 +665,16 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
           assert(exchangePlans.length == 1)
         }
       }
    +
    +  Seq(true, false).foreach { codegen =>
    +    test("SPARK-22951: dropDuplicates on empty data frames should produce 
correct aggregate" +
    +      s" results when codegen enabled: $codegen") {
    +      withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, 
codegen.toString)) {
    +        assert(Seq.empty[Int].toDF("a").count() == 0)
    +        assert(Seq.empty[Int].toDF("a").agg(count("*")).count() == 1)
    +        assert(spark.emptyDataFrame.dropDuplicates().count() == 0)
    +        
assert(spark.emptyDataFrame.dropDuplicates().agg(count("*")).count() == 1)
    --- End diff --
    
    @liufengdb Maybe also add assertions to confirm that explicit global 
aggregations (by providing zero grouping keys) still return one row? For 
example:
    
    ```scala
    val emptyAgg = Map.empty[String, String]
    
    checkAnswer(
      spark.emptyDataFrame.agg(emptyAgg),
      Seq(Row())
    )
    
    checkAnswer(
      spark.emptyDataFrame.groupBy().agg(emptyAgg),
      Seq(Row())
    )
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to