GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/21212
[SPARK-24143] filter empty blocks when convert mapstatus to (blockId,â¦
⦠size) pair.
## What changes were proposed in this pull request?
In current code(`MapOutputTracker.convertMapStatuses`), mapstatus are
converted to (blockId, size) pair for all blocks â no matter the block is
empty or not, which result in OOM when there are lots of consecutive empty
blocks, especially when adaptive execution is enabled.
(blockId, size) pair is only used in `ShuffleBlockFetcherIterator` to
control shuffle-read and only non-empty block request is sent. Can we just
filter out the empty blocks in MapOutputTracker.convertMapStatuses and save
memory?
## How was this patch tested?
not added yet.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinxing64/spark SPARK-24143
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21212.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21212
----
commit 5211ebd5cc4d3de023752b8ab8168d7bda18aa83
Author: jinxing <jinxing6042@...>
Date: 2018-05-02T05:40:34Z
[SPARK-24143] filter empty blocks when convert mapstatus to (blockId, size)
pair.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]