[GitHub] spark pull request #21330: [SPARK-22234] Support distinct window functions

jinxing64 Tue, 15 May 2018 04:52:12 -0700

GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/21330


    [SPARK-22234] Support distinct window functions

    ## What changes were proposed in this pull request?
    This pr proposes to support distinct window functions. After this change, 
query like below are supported:
    ```
    SELECT val, cate,
    sum(val) OVER (PARTITION BY cate) AS sum1,
    sum(DISTINCT val) OVER (PARTITION BY cate) AS sum2
    FROM testData
    ```
    In this pr:
    1. ORDER BY and distinct window function don't work at the same time due to 
performance concern and implementation requirement.
    2. Insert distinct fields into the order by list, thus during counting, we 
only need to compare the current row against the previous row, i we ignore the 
current row if they have same projected values.
    
    ## How was this patch tested?
    Test added and more tests to be added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-22234

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21330
    
----
commit 65ad0740e21ffb16cfe6d9ee076f36d93a9aaaea
Author: jinxing <jinxing6042@...>
Date:   2018-05-15T10:11:40Z

    [SPARK-22234] Support distinct window functions

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21330: [SPARK-22234] Support distinct window functions

Reply via email to