GitHub user JeetKunDoug opened a pull request:
https://github.com/apache/spark/pull/21322
[SPARK-24225] Support closing AutoClosable objects in MemoryStore
This allows Broadcast Variables can be released properly
## What changes were proposed in this pull request?
Broadcast variables, while usually used to broadcast data to executors, can
also be used to control the scope and lifecycle of shared resources (e.g.
connection pools). When creating and destroying those resources within a task
is expensive, using a broadcast variable to keep them deserialized in memory
for multiple tasks to share can make a huge difference in the efficiency of a
Spark job.
In `MemoryStore`, check if any entries in a `DeserializedMemoryEntry`
implement `AutoClosable` and, if so, call `close` on those resources. This
occurs in two places:
- `remove` of an individual item
- `clear` of the MemoryStore
## How was this patch tested?
Added additional tests to `MemoryStoreSuite` in order to check that we
properly close resources, and handle exceptions properly.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JeetKunDoug/spark handle-autoclosable-objects
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21322.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21322
----
commit f254f94fdc5e2648d7c1104bf5ec2355de7c6055
Author: Doug Rohrer <drohrer@...>
Date: 2018-05-14T16:24:00Z
[SPARK-24225] Support closing AutoClosable objects in MemoryStore so
Broadcast Variables can be released properly
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]