GitHub user ferdonline opened a pull request:
https://github.com/apache/spark/pull/21087
[SPARK-23997][SQL] Configurable maximum number of buckets
## What changes were proposed in this pull request?
This PR implements the possibility of the user to override the maximum
number of buckets when saving to a table.
Currently the limit is a hard-coded 100k, which might be insufficient for
large workloads.
A new configuration entry is proposed: `spark.sql.bucketing.maxBuckets`,
which defaults to the previous 100k.
## How was this patch tested?
Added unit tests in the following spark.sql test suites:
- CreateTableAsSelectSuite
- BucketedWriteSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ferdonline/spark enh/configurable_bucket_limit
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21087.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21087
----
commit 61a476fe1f90b2e4c8ddbf82024f8116d737d2ef
Author: Fernando Pereira <fernando.pereira@...>
Date: 2018-04-17T12:53:59Z
Adding configurable max buckets
commit a8846568db9eb63095c9dc55e8b71906ff95e6b0
Author: Fernando Pereira <fernando.pereira@...>
Date: 2018-04-17T15:22:18Z
fixing tests in spark.sql
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]