GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/20102
[SPARK-22917][SQL] Should not try to generate histogram for empty/null
columns
## What changes were proposed in this pull request?
For empty/null column, the result of `ApproximatePercentile` is null. Then
in `ApproxCountDistinctForIntervals`, a `MatchError` (for `endpoints`) will be
thrown if we try to generate histogram for that column. Besides, there is no
need to generate histogram for such column. In this patch, we exclude such
column when generating histogram.
## How was this patch tested?
Enhanced test cases for empty/null columns.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wzhfy/spark no_record_hgm_bug
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20102.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20102
----
commit 9617c2d982ed799580957a1467d47f42e8124636
Author: Zhenhua Wang <wangzhenhua@...>
Date: 2017-12-28T08:36:23Z
fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]