GitHub user ron8hu opened a pull request: https://github.com/apache/spark/pull/19783
support histogram in filter cardinality estimation ## What changes were proposed in this pull request? Histogram is effective in dealing with skewed distribution. After we generate histogram information for column statistics, we need to adjust filter estimation based on histogram data structure. ## How was this patch tested? We revised all the unit test cases by including histogram data structure. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ron8hu/spark supportHistogram Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19783.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19783 ---- commit dd5b975dafdf9fc4edd94cf6e369f5e899db74e2 Author: Ron Hu <ron...@huawei.com> Date: 2017-11-19T19:37:47Z support histogram in filter cardinality estimation ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org