[GitHub] spark pull request #20102: [SPARK-22917][SQL] Should not try to generate his...

wzhfy Thu, 28 Dec 2017 00:46:07 -0800

GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/20102


    [SPARK-22917][SQL] Should not try to generate histogram for empty/null 
columns

    ## What changes were proposed in this pull request?
    
    For empty/null column, the result of `ApproximatePercentile` is null. Then 
in `ApproxCountDistinctForIntervals`, a `MatchError` (for `endpoints`) will be 
thrown if we try to generate histogram for that column. Besides, there is no 
need to generate histogram for such column. In this patch, we exclude such 
column when generating histogram.
    
    ## How was this patch tested?
    
    Enhanced test cases for empty/null columns.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark no_record_hgm_bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20102
    
----
commit 9617c2d982ed799580957a1467d47f42e8124636
Author: Zhenhua Wang <wangzhenhua@...>
Date:   2017-12-28T08:36:23Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20102: [SPARK-22917][SQL] Should not try to generate his...

Reply via email to