GitHub user squito opened a pull request:
https://github.com/apache/spark/pull/5463
[Spark-6839] BlockManger.dataDeserialize
https://issues.apache.org/jira/browse/SPARK-6839
This needed to touch a surprisingly large amount of code to make sure that
`BlockManager.dataDeserialize` always gets passed something which can ensure
the input stream gets closed. I trimmed out what I could, but almost all paths
through `BlockManager` might end up calling `dataDeserialize` (even when blocks
are being `put`).
There isn't always a `TaskContext` at all of the relevant call sites, so I
made a new abstraction `ResourceCleaner`. `TaskContext` extends
`ResourceCleaner`, so we use `TaskContext` where we have one; otherwise there
is a `SimpleResourceCleaner` that just keeps a list of functions to run in a
`finally` block.
I also considered forcing `DeserializationStream.asIterator` to need a
`ResourceCleaner`. That way we'd force the *right* cleaning function to be
used everytime somebody called `stream.asIterator`. However, I figure it is
*possible* that you might read some of the stream and not necessarily want
close it. I'm curious for the opinion of others on this.
`BroadcastSuite` is somewhat flaky when run on my laptop ... I'm pretty
sure that is not related to these changes, but I guess we'll see what jenkins
says.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/squito/spark SPARK-6839_dispose_bytebuffers
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5463.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5463
----
commit 345803bf316d214fbc027a3947ec4bdbb9e5ce0e
Author: Imran Rashid <[email protected]>
Date: 2015-04-10T14:45:39Z
change ByteBufferInputStream to do dispose in close(), rather than at end
of stream
commit 5e7214ff5dcf4379137512e4fda8c29dd19bf40e
Author: Imran Rashid <[email protected]>
Date: 2015-04-10T14:47:07Z
add test for DeserializationStream (passed w/out changes)
commit 2053a15b5200ddf485dd6124766b892e085630f1
Author: Imran Rashid <[email protected]>
Date: 2015-04-10T19:59:30Z
every call to BlockManager.dataDeserialize requires a ResourceCleaner to
ensure the stream gets closed
commit 32af41893d73dc6d0b490b5db930dc5d1014e4bd
Author: Imran Rashid <[email protected]>
Date: 2015-04-10T20:23:33Z
add test
commit 33c8be9217bf3137e4a1e65e6d8b14af9324fd6c
Author: Imran Rashid <[email protected]>
Date: 2015-04-10T21:10:09Z
rename
commit 76bf6f2cbd253b79a285c62c488abfa7fed43a09
Author: Imran Rashid <[email protected]>
Date: 2015-04-10T21:33:49Z
style
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]