GitHub user wzhfy reopened a pull request:
https://github.com/apache/spark/pull/15544
[SPARK-17997] [SQL] Add an aggregation function for counting distinct
values for multiple intervals
## What changes were proposed in this pull request?
This work is a part of
[SPARK-17074](https://issues.apache.org/jira/browse/SPARK-17074) to compute
equi-height histograms. Equi-height histogram is an array of bins. A bin
consists of two endpoints which form an interval of values and the ndv in that
interval.
This PR creates a new aggregate function, given an array of endpoints,
counting distinct values (ndv) in intervals among those endpoints.
This PR also refactors `HyperLogLogPlusPlus` by extracting a helper class
`HyperLogLogPlusPlusHelper`, where the underlying HLLPP algorithm locates.
## How was this patch tested?
Add new test cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wzhfy/spark countIntervals
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15544.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15544
----
commit 9960fab07d2075d2beba1fea7024fe6dd30d9eef
Author: wangzhenhua <[email protected]>
Date: 2016-10-14T06:23:39Z
refactor hllpp
commit 5aa835ce2769a34f88bacb389c4af30f52459226
Author: wangzhenhua <[email protected]>
Date: 2016-10-17T13:18:36Z
add IntervalDistinctApprox
commit 840171efa08c70da83af54bc726079a88fb7a1d2
Author: wangzhenhua <[email protected]>
Date: 2016-10-19T01:58:32Z
add test cases
commit a6417e7df5cf44ba9f75a7d66d46258a56b0082f
Author: wangzhenhua <[email protected]>
Date: 2016-10-20T04:46:57Z
convert HLLPP and IntervalDistinctApprox to ImperativeAggregate
commit 74d7ae7ac817d427a264b67f580fe39bbb49811b
Author: wangzhenhua <[email protected]>
Date: 2016-11-04T08:36:23Z
add negative column type test and update doc
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]