[GitHub] [spark] attilapiros commented on a change in pull request #30788: [SPARK-33726][SQL] Fix for Duplicate field names during Aggregation

GitBox Mon, 18 Jan 2021 02:13:36 -0800


attilapiros commented on a change in pull request #30788:
URL: https://github.com/apache/spark/pull/30788#discussion_r559453797




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
##########
@@ -1079,6 +1079,18 @@ class DataFrameAggregateSuite extends QueryTest
     assert(aggs.head.output.map(_.dataType.simpleString).head ===
       aggs.last.output.map(_.dataType.simpleString).head)
   }
+
+  test("SPARK-33726 Duplicate field name aggregation should not have null 
values in dataframe") {
+    val query =
+      """|with T as (select id as a, -id as x from range(3)), U as (select id 
as b,
+         |cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
ma, min(b) as mb
+         |from T join U on a=b group by U.x, T.x
+      """.stripMargin
+    val df = spark.sql(query)
+    val nullCount = df.filter($"ma".isNull ).count + df.filter($"mb".isNull 
).count
+    + df.filter($"U.x".isNull ).count + df.filter($"T.x".isNull).count
+    assert(nullCount == 0)

Review comment:
       Please update the PR description (by removing the "check that there's no 
null values in the Dataframe for the sample query" ). And please remove the 
link to the jira and explain here the problem fixed by this PR (the description 
will be the commit message when the fix is merged and browsing just the git 
commit messages on its own is easier than following links).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] attilapiros commented on a change in pull request #30788: [SPARK-33726][SQL] Fix for Duplicate field names during Aggregation

Reply via email to