[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...

cloud-fan Sun, 11 Nov 2018 23:39:48 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22944#discussion_r232556359
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
    @@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
           df.where($"city".contains(new java.lang.Character('A'))),
           Seq(Row("Amsterdam")))
       }
    +
    +  test("SPARK-25942: typed aggregation on primitive type") {
    +    val ds = Seq(1, 2, 3).toDS()
    +
    +    val agg = ds.groupByKey(_ >= 2)
    +      .agg(sum("value").as[Long], sum($"value" + 1).as[Long])
    --- End diff --
    
    I think we should not make decisions for users. For untyped APIs, users can 
refer the grouping columns in the aggregate expressions, I think the typed APIs 
should be same.
    
    For this particular case, currrently spark allows grouping columns inside 
aggregate functions, so the `value` here is indeed ambiguous. There is nothing 
we can do, but fail and ask users to add alias.
    
    BTW, we should check other databases and see if "grouping columns inside 
aggregate functions" should be allowed,



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...

Reply via email to