Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22944#discussion_r232556359
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
---
@@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with
SharedSQLContext {
df.where($"city".contains(new java.lang.Character('A'))),
Seq(Row("Amsterdam")))
}
+
+ test("SPARK-25942: typed aggregation on primitive type") {
+ val ds = Seq(1, 2, 3).toDS()
+
+ val agg = ds.groupByKey(_ >= 2)
+ .agg(sum("value").as[Long], sum($"value" + 1).as[Long])
--- End diff --
I think we should not make decisions for users. For untyped APIs, users can
refer the grouping columns in the aggregate expressions, I think the typed APIs
should be same.
For this particular case, currrently spark allows grouping columns inside
aggregate functions, so the `value` here is indeed ambiguous. There is nothing
we can do, but fail and ask users to add alias.
BTW, we should check other databases and see if "grouping columns inside
aggregate functions" should be allowed,
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]