GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/10705

    [SPARK-12757][WIP] Use reference counting to prevent blocks from being 
evicted during reads

    As a pre-requisite to off-heap caching of blocks, we need a mechanism to 
prevent pages / blocks from being evicted while they are being read. With 
on-heap objects, evicting a block while it is being read merely leads to 
memory-accounting problems (because we assume that an evicted block is a 
candidate for garbage-collection, which will not be true during a read), but 
with off-heap memory this will lead to either data corruption or segmentation 
faults.
    
    To address this, we should add a reference-counting mechanism to track 
which blocks/pages are being read in order to prevent them from being evicted 
prematurely. I propose to do this in two phases: first, add a safe, 
conservative approach in which all BlockManager.get*() calls implicitly 
increment the reference count of blocks and where tasks' references are 
automatically freed upon task completion. This will be correct but may have 
adverse performance impacts because it will prevent legitimate block evictions. 
In phase two, we should incrementally add release() calls in order to fix the 
eviction of unreferenced blocks. The latter change may need to touch many 
different components, which is why I propose to do it separately in order to 
make the changes easier to reason about and review.
    
    This PR is currently WIP, pending test fixes and a few additional 
improvements:
    
    - I need to add significantly more debug logging statements. From my 
experience in working on other memory-management-related things in Spark, I've 
realized that it's extremely useful to have a set of verbose logging statements 
that we can enable with a Log4J conf.
    - Guard the "non-zero reference count prevents eviction" check behind a 
debugging feature-flag to let us disable this feature for testing. This will be 
a useful debugging aid in phase 2.
    - Get the existing tests to pass.
    - Write API documentation for the `release()` methods.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark pin-pages

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10705
    
----
commit 5d130e44dbb8259588ac1b9006dc41c597c8a4a0
Author: Josh Rosen <[email protected]>
Date:   2016-01-08T21:11:51Z

    Add block reference counting class.

commit 423faabe3a34c6021a859c93cb97ac7c946529e2
Author: Josh Rosen <[email protected]>
Date:   2016-01-08T21:46:13Z

    Make the ReferenceCounter generic, since it's not specific to storage in 
any respect.

commit 1ee665f845addb493c0c822764018d3188aa30d1
Author: Josh Rosen <[email protected]>
Date:   2016-01-08T21:52:50Z

    Merge remote-tracking branch 'origin/master' into pin-pages

commit 76cfebd15137fb0090f89dbd1791aad9eca09902
Author: Josh Rosen <[email protected]>
Date:   2016-01-08T23:13:33Z

    Integrate reference counter into storage eviction code.

commit 7265784f821c5ca451322e0a2b1bfdcf8c952af4
Author: Josh Rosen <[email protected]>
Date:   2016-01-11T20:24:53Z

    Merge remote-tracking branch 'origin/master' into pin-pages

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to