GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/10705
[SPARK-12757][WIP] Use reference counting to prevent blocks from being
evicted during reads
As a pre-requisite to off-heap caching of blocks, we need a mechanism to
prevent pages / blocks from being evicted while they are being read. With
on-heap objects, evicting a block while it is being read merely leads to
memory-accounting problems (because we assume that an evicted block is a
candidate for garbage-collection, which will not be true during a read), but
with off-heap memory this will lead to either data corruption or segmentation
faults.
To address this, we should add a reference-counting mechanism to track
which blocks/pages are being read in order to prevent them from being evicted
prematurely. I propose to do this in two phases: first, add a safe,
conservative approach in which all BlockManager.get*() calls implicitly
increment the reference count of blocks and where tasks' references are
automatically freed upon task completion. This will be correct but may have
adverse performance impacts because it will prevent legitimate block evictions.
In phase two, we should incrementally add release() calls in order to fix the
eviction of unreferenced blocks. The latter change may need to touch many
different components, which is why I propose to do it separately in order to
make the changes easier to reason about and review.
This PR is currently WIP, pending test fixes and a few additional
improvements:
- I need to add significantly more debug logging statements. From my
experience in working on other memory-management-related things in Spark, I've
realized that it's extremely useful to have a set of verbose logging statements
that we can enable with a Log4J conf.
- Guard the "non-zero reference count prevents eviction" check behind a
debugging feature-flag to let us disable this feature for testing. This will be
a useful debugging aid in phase 2.
- Get the existing tests to pass.
- Write API documentation for the `release()` methods.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark pin-pages
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10705.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10705
----
commit 5d130e44dbb8259588ac1b9006dc41c597c8a4a0
Author: Josh Rosen <[email protected]>
Date: 2016-01-08T21:11:51Z
Add block reference counting class.
commit 423faabe3a34c6021a859c93cb97ac7c946529e2
Author: Josh Rosen <[email protected]>
Date: 2016-01-08T21:46:13Z
Make the ReferenceCounter generic, since it's not specific to storage in
any respect.
commit 1ee665f845addb493c0c822764018d3188aa30d1
Author: Josh Rosen <[email protected]>
Date: 2016-01-08T21:52:50Z
Merge remote-tracking branch 'origin/master' into pin-pages
commit 76cfebd15137fb0090f89dbd1791aad9eca09902
Author: Josh Rosen <[email protected]>
Date: 2016-01-08T23:13:33Z
Integrate reference counter into storage eviction code.
commit 7265784f821c5ca451322e0a2b1bfdcf8c952af4
Author: Josh Rosen <[email protected]>
Date: 2016-01-11T20:24:53Z
Merge remote-tracking branch 'origin/master' into pin-pages
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]