[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

vgankidi Wed, 01 Nov 2017 13:13:42 -0700

GitHub user vgankidi opened a pull request:

    https://github.com/apache/spark/pull/19633


    [SPARK-22411][SQL] Disable the heuristic to calculate max partition size 
when  dynamic allocation is enabled and use the value specified by the property 
spark.sql.files.maxPartitionBytes instead

    
    ## What changes were proposed in this pull request?
    
    The heuristic to calculate the maxSplitSize in DataSourceScanExec is as 
follows:
    
https://github.com/apache/spark/blob/d28d5732ae205771f1f443b15b10e64dcffb5ff0/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L431
    Default parallelism in this case is the number of total cores of all the 
registered executors for this application. This works well with static 
allocation but with dynamic allocation enabled, this value is usually one (with 
default config of min and initial executors as zero) at the time of split 
calculation. This heuristic was introduced in SPARK-14582.
    When Dynamic allocation it is confusing to tune the split size with this 
heuristic. It is better to ignore bytesPerCore and use the values of 
'spark.sql.files.maxPartitionBytes' as the max split size.
    
    ## How was this patch tested?
    Tested manually.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vgankidi/spark SPARK-22411

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19633
    
----
commit 4157771715a235fe5ffad970764b805fd74f45d5
Author: Vinitha Gankidi <vgank...@netflix.com>
Date:   2017-11-01T20:09:44Z

    [SPARK-22411][SQL] Disable the heuristic to calculate max partition size 
when
    dynamic allocation is enabled and use the value specified by the property
    spark.sql.files.maxPartitionBytes instead

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

Reply via email to