attilapiros commented on a change in pull request #30788:
URL: https://github.com/apache/spark/pull/30788#discussion_r559453797
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
##########
@@ -1079,6 +1079,18 @@ class DataFrameAggregateSuite extends QueryTest
assert(aggs.head.output.map(_.dataType.simpleString).head ===
aggs.last.output.map(_.dataType.simpleString).head)
}
+
+ test("SPARK-33726 Duplicate field name aggregation should not have null
values in dataframe") {
+ val query =
+ """|with T as (select id as a, -id as x from range(3)), U as (select id
as b,
+ |cast(id as string) as x from range(3)) select T.x, U.x, min(a) as
ma, min(b) as mb
+ |from T join U on a=b group by U.x, T.x
+ """.stripMargin
+ val df = spark.sql(query)
+ val nullCount = df.filter($"ma".isNull ).count + df.filter($"mb".isNull
).count
+ + df.filter($"U.x".isNull ).count + df.filter($"T.x".isNull).count
+ assert(nullCount == 0)
Review comment:
Please update the PR description (by removing the "check that there's no
null values in the Dataframe for the sample query" ). And please remove the
link to the jira and explain here the problem fixed by this PR (the description
will be the commit message when the fix is merged and browsing just the git
commit messages on its own is easier than following links).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]