GitHub user ferdonline opened a pull request:

    https://github.com/apache/spark/pull/21087

    [SPARK-23997][SQL] Configurable maximum number of buckets

    ## What changes were proposed in this pull request?
    This PR implements the possibility of the user to override the maximum 
number of buckets when saving to a table. 
    Currently the limit is a hard-coded 100k, which might be insufficient for 
large workloads.
    A new configuration entry is proposed: `spark.sql.bucketing.maxBuckets`, 
which defaults to the previous 100k.
    
    ## How was this patch tested?
    Added unit tests in the following spark.sql test suites:
    
    - CreateTableAsSelectSuite
    - BucketedWriteSuite


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ferdonline/spark enh/configurable_bucket_limit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21087
    
----
commit 61a476fe1f90b2e4c8ddbf82024f8116d737d2ef
Author: Fernando Pereira <fernando.pereira@...>
Date:   2018-04-17T12:53:59Z

    Adding configurable max buckets

commit a8846568db9eb63095c9dc55e8b71906ff95e6b0
Author: Fernando Pereira <fernando.pereira@...>
Date:   2018-04-17T15:22:18Z

    fixing tests in spark.sql

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to