GitHub user hthuynh2 opened a pull request:
https://github.com/apache/spark/pull/21527
Spark branch 1
**Problem**
MapStatus uses hardcoded value of 2000 partitions to determine if it should
use highly compressed map status. We should make it configurable.
**What changes were proposed in this pull request?**
I make the hardcoded value mentioned above to be configurable under the
name _SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_, which has default value to be
2000. Users can set it to the value they want by setting the property name
_spark.shuffle.minNumPartitionsToHighlyCompress_
**How was this patch tested?**
I wrote a unit test to make sure that the default value is 2000, and
_IllegalArgumentException_ will be thrown if user set it to a non-positive
value. The unit test also checks that highly compressed map status is correctly
used when the number of partition is greater than
_SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hthuynh2/spark spark_branch_1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21527.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21527
----
commit 93582bd1ce114368654ff896749c517d979ed23a
Author: Hieu Huynh <âhieu.huynh@...>
Date: 2018-06-11T13:47:02Z
Change MapStatus hardcode value to configurable
commit d3f24b501c68f8ef22726d711a887268d02a9fc7
Author: Hieu Huynh <âhieu.huynh@...>
Date: 2018-06-11T14:16:25Z
Fixed incorrect name
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]