GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20152
[SPARK-22957] ApproxQuantile breaks if the number of rows exceeds MaxInt ## What changes were proposed in this pull request? 32bit Int was used for row rank. That overflowed in a dataframe with more than 2B rows. ## How was this patch tested? Added test, but ignored, as it takes 4 minutes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/juliuszsompolski/apache-spark SPARK-22957 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20152 ---- commit 324218b6065f1ad57479d5ee582694826c1309f9 Author: Juliusz Sompolski <julek@...> Date: 2018-01-04T13:22:49Z SPARK-22957 ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org