GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/15637

    [SPARK-18000] [SQL] Aggregation function for computing endpoints for 
histograms

    ## What changes were proposed in this pull request?
    
    This function for a column returns bins - (distinct value, frequency) pairs
    of equi-width histogram when the number of distinct values is less than or 
equal to the
    specified maximum number of bins. Otherwise, for column of string type, it 
returns an empty
    map; for column of numeric type, it returns endpoints of equi-height 
histogram - approximate
    percentiles at percentages 0.0, 1/numBins, 2/numBins, ..., 
(numBins-1)/numBins, 1.0.
    
    ## How was this patch tested?
    
    add test cases

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark histogramEndpoints

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15637
    
----
commit 1580a7fb0a6a21b603e754c744c8c6cb02fd24c2
Author: wangzhenhua <[email protected]>
Date:   2016-10-12T01:02:37Z

    add agg function to generate string histogram

commit a3281606372f83eca960ea90734e8ee9cb1c3125
Author: wangzhenhua <[email protected]>
Date:   2016-10-21T08:10:01Z

    comments

commit 907cd99b8b26ae3caa224df67cc10bc784f10fb4
Author: Zhenhua Wang <[email protected]>
Date:   2016-10-22T12:15:22Z

    create HistogramEndpoints to generate endpoints for string and numeric types

commit 35e453cb1079398196ece4f13e8f294ee4e4e916
Author: Zhenhua Wang <[email protected]>
Date:   2016-10-23T15:05:09Z

    change suite names

commit f6fe25de3f5f1382727cecfdda7b74e40758896b
Author: wangzhenhua <[email protected]>
Date:   2016-10-26T03:20:05Z

    test cases and fix bugs

commit 15eb3721f56ac27bd90933ef7e66f3453eae4a75
Author: wangzhenhua <[email protected]>
Date:   2016-10-26T03:29:14Z

    fix doc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to