GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/21212

    [SPARK-24143] filter empty blocks when convert mapstatus to (blockId,…

    … size) pair.
    
    ## What changes were proposed in this pull request?
    
    In current code(`MapOutputTracker.convertMapStatuses`), mapstatus are 
converted to (blockId, size) pair for all blocks – no matter the block is 
empty or not, which result in OOM when there are lots of consecutive empty 
blocks, especially when adaptive execution is enabled.
    
    (blockId, size) pair is only used in `ShuffleBlockFetcherIterator` to 
control shuffle-read and only non-empty block request is sent. Can we just 
filter out the empty blocks in MapOutputTracker.convertMapStatuses and save 
memory?
    
    
    ## How was this patch tested?
    
    not added yet.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-24143

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21212
    
----
commit 5211ebd5cc4d3de023752b8ab8168d7bda18aa83
Author: jinxing <jinxing6042@...>
Date:   2018-05-02T05:40:34Z

    [SPARK-24143] filter empty blocks when convert mapstatus to (blockId, size) 
pair.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to