GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21487

    [SPARK-24369][SQL] Correct handling for multiple distinct aggregations 
having the same argument set

    ## What changes were proposed in this pull request?
    
    bring back https://github.com/apache/spark/pull/21443
    
    This is a different approach: just change the check to count distinct 
columns with `toSet`
    
    ## How was this patch tested?
    
    a new test to verify the planner behavior.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark back

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21487.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21487
    
----
commit 6c4b29571a9490b8667b7827776659d8e4c18866
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-05-30T16:23:25Z

    [SPARK-24369][SQL] Correct handling for multiple distinct aggregations 
having the same argument set
    
    ## What changes were proposed in this pull request?
    This pr fixed an issue when having multiple distinct aggregations having 
the same argument set, e.g.,
    ```
    scala>: paste
    val df = sql(
      s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
         | FROM (VALUES (1, 1), (2, 2), (2, 2)) t(x, y)
       """.stripMargin)
    
    java.lang.RuntimeException
    You hit a query analyzer bug. Please report your query to Spark user 
mailing list.
    ```
    The root cause is that `RewriteDistinctAggregates` can't detect multiple 
distinct aggregations if they have the same argument set. This pr modified code 
so that `RewriteDistinctAggregates` could count the number of aggregate 
expressions with `isDistinct=true`.
    
    ## How was this patch tested?
    Added tests in `DataFrameAggregateSuite`.
    
    Author: Takeshi Yamamuro <yamam...@apache.org>
    
    Closes #21443 from maropu/SPARK-24369.

commit 8386b4250d90eb369c85f02de7bbabe7a2ebbdaa
Author: Wenchen Fan <wenchen@...>
Date:   2018-06-03T01:41:11Z

    another fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to