GitHub user superbobry opened a pull request:

    https://github.com/apache/spark/pull/19458

    [SPARK-22227][CORE] DiskBlockManager.getAllBlocks now tolerates temp files

    ## What changes were proposed in this pull request?
    
    Prior to this commit getAllBlocks implicitly assumed that the directories
    managed by the DiskBlockManager contain only the files corresponding to
    valid block IDs. In reality, this assumption was violated during shuffle,
    which produces temporary files in the same directory as the resulting
    blocks. As a result, calls to getAllBlocks during shuffle were unreliable.
    
    The fix could be made more efficient, but this is probably good enough.
    
    ## How was this patch tested?
    
    `DiskBlockAggregateSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/criteo-forks/spark block-id-option

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19458.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19458
    
----
commit 9b9b86fed0e5949fd9e7abaefe08c3d9d986feb6
Author: Sergei Lebedev <s.lebe...@criteo.com>
Date:   2017-10-09T16:52:00Z

    [SPARK-22227][CORE] DiskBlockManager.getAllBlocks now tolerates temp files
    
    Prior to this commit getAllBlocks implicitly assumed that the directories
    managed by the DiskBlockManager contain only the files corresponding to
    valid block IDs. In reality this assumption was violated during shuffle,
    which produces temporary files in the same directory as the resulting
    blocks. As a result, calls to getAllBlocks during shuffle were unreliable.
    
    The fix could be made more efficient, but this is probably good enough.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to