GitHub user ferdonline opened a pull request: https://github.com/apache/spark/pull/21087
[SPARK-23997][SQL] Configurable maximum number of buckets ## What changes were proposed in this pull request? This PR implements the possibility of the user to override the maximum number of buckets when saving to a table. Currently the limit is a hard-coded 100k, which might be insufficient for large workloads. A new configuration entry is proposed: `spark.sql.bucketing.maxBuckets`, which defaults to the previous 100k. ## How was this patch tested? Added unit tests in the following spark.sql test suites: - CreateTableAsSelectSuite - BucketedWriteSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/ferdonline/spark enh/configurable_bucket_limit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21087.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21087 ---- commit 61a476fe1f90b2e4c8ddbf82024f8116d737d2ef Author: Fernando Pereira <fernando.pereira@...> Date: 2018-04-17T12:53:59Z Adding configurable max buckets commit a8846568db9eb63095c9dc55e8b71906ff95e6b0 Author: Fernando Pereira <fernando.pereira@...> Date: 2018-04-17T15:22:18Z fixing tests in spark.sql ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org