GitHub user hthuynh2 opened a pull request:

    https://github.com/apache/spark/pull/21527

    Spark branch 1

    **Problem**
    MapStatus uses hardcoded value of 2000 partitions to determine if it should 
use highly compressed map status. We should make it configurable.
    
    **What changes were proposed in this pull request?**
    I make the hardcoded value mentioned above to be configurable under the 
name _SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_, which has default value to be 
2000. Users can set it to the value they want by setting the property name 
_spark.shuffle.minNumPartitionsToHighlyCompress_ 
    
    **How was this patch tested?**
    I wrote a unit test to make sure that the default value is 2000, and  
_IllegalArgumentException_ will be thrown if user set it to a non-positive 
value. The unit test also checks that highly compressed map status is correctly 
used when the number of partition is greater than 
_SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hthuynh2/spark spark_branch_1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21527
    
----
commit 93582bd1ce114368654ff896749c517d979ed23a
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-06-11T13:47:02Z

    Change MapStatus hardcode value to configurable

commit d3f24b501c68f8ef22726d711a887268d02a9fc7
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-06-11T14:16:25Z

    Fixed incorrect name

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to