GitHub user hthuynh2 opened a pull request: https://github.com/apache/spark/pull/21527
Spark branch 1 **Problem** MapStatus uses hardcoded value of 2000 partitions to determine if it should use highly compressed map status. We should make it configurable. **What changes were proposed in this pull request?** I make the hardcoded value mentioned above to be configurable under the name _SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_, which has default value to be 2000. Users can set it to the value they want by setting the property name _spark.shuffle.minNumPartitionsToHighlyCompress_ **How was this patch tested?** I wrote a unit test to make sure that the default value is 2000, and _IllegalArgumentException_ will be thrown if user set it to a non-positive value. The unit test also checks that highly compressed map status is correctly used when the number of partition is greater than _SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hthuynh2/spark spark_branch_1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21527.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21527 ---- commit 93582bd1ce114368654ff896749c517d979ed23a Author: Hieu Huynh <âhieu.huynh@...> Date: 2018-06-11T13:47:02Z Change MapStatus hardcode value to configurable commit d3f24b501c68f8ef22726d711a887268d02a9fc7 Author: Hieu Huynh <âhieu.huynh@...> Date: 2018-06-11T14:16:25Z Fixed incorrect name ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org