GitHub user wzhfy reopened a pull request:

    https://github.com/apache/spark/pull/15544

    [SPARK-17997] [SQL] Add an aggregation function for counting distinct 
values for multiple intervals

    ## What changes were proposed in this pull request?
    
    This work is a part of 
[SPARK-17074](https://issues.apache.org/jira/browse/SPARK-17074) to compute 
equi-height histograms. Equi-height histogram is an array of bins. A bin 
consists of two endpoints which form an interval of values and the ndv in that 
interval.
    
    This PR creates a new aggregate function, given an array of endpoints, 
counting distinct values (ndv) in intervals among those endpoints.
    
    This PR also refactors `HyperLogLogPlusPlus` by extracting a helper class 
`HyperLogLogPlusPlusHelper`, where the underlying HLLPP algorithm locates.
    
    ## How was this patch tested?
    
    Add new test cases.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark countIntervals

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15544.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15544
    
----
commit 9960fab07d2075d2beba1fea7024fe6dd30d9eef
Author: wangzhenhua <[email protected]>
Date:   2016-10-14T06:23:39Z

    refactor hllpp

commit 5aa835ce2769a34f88bacb389c4af30f52459226
Author: wangzhenhua <[email protected]>
Date:   2016-10-17T13:18:36Z

    add IntervalDistinctApprox

commit 840171efa08c70da83af54bc726079a88fb7a1d2
Author: wangzhenhua <[email protected]>
Date:   2016-10-19T01:58:32Z

    add test cases

commit a6417e7df5cf44ba9f75a7d66d46258a56b0082f
Author: wangzhenhua <[email protected]>
Date:   2016-10-20T04:46:57Z

    convert HLLPP and IntervalDistinctApprox to ImperativeAggregate

commit 74d7ae7ac817d427a264b67f580fe39bbb49811b
Author: wangzhenhua <[email protected]>
Date:   2016-11-04T08:36:23Z

    add negative column type test and update doc

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to