GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/11660
[SPARK-XXXXX] Guard against race condition when re-caching disk blocks in
memory
When reading data from the DiskStore and attempting to cache it back into
the memory store, we should guard against race conditions where multiple
readers are attempting to re-cache the same block in memory.
This patch accomplishes this by synchronizing on the block's `BlockInfo`
object while trying to re-cache a block.
(Will file JIRA as soon as ASF JIRA stops being down / laggy).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark concurrent-recaching-fixes
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11660.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11660
----
commit 00ea8d350eeb6bfe39c809d9f703a17ef710618c
Author: Josh Rosen <[email protected]>
Date: 2016-03-11T21:22:09Z
De-duplicate disk -> memory caching code.
commit a0c68e20d1ef86eded51b9212e0c888acf5955e1
Author: Josh Rosen <[email protected]>
Date: 2016-03-11T21:29:56Z
Clarify that read lock must be held by caller of maybeCache*
commit 7f678d25ba6a8700917093e13896dbb255241fd3
Author: Josh Rosen <[email protected]>
Date: 2016-03-11T21:44:21Z
Synchronize on blockInfo to guard against concurrent re-caching.
commit 5342712afeeb76ac8c30bb4bb884dc0ba900fb92
Author: Josh Rosen <[email protected]>
Date: 2016-03-11T21:47:53Z
Add some BlockManager.dispose() calls to free disk buffer earlier.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]