GitHub user ahmed-mahran opened a pull request:
https://github.com/apache/spark/pull/13866
[SPARK-16160] [STREAMING] Clear last remembered metadata window per dstream
upon context graceful stop
## What changes were proposed in this pull request?
When stopping a streaming context gracefully, the last remembered window
for each dstream is not cleared. If the streaming context is stopped without
stopping the spark context, the persisted garbage rdds would still be there
especially when there is a relatively large remember duration in case of
windowing or checkpointing.
In this PR, `JobGenerator` makes a final call to `clearMetadata` such that
all remembered metadata are cleared on graceful stop of a streaming context.
## How was this patch tested?
A new unit test is introduced that captures this case.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ahmed-mahran/spark
b-stop-gracefully-clear-metadata
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13866.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13866
----
commit 767ff4827a31acf368a81a9c78c47a2380f2a34c
Author: Ahmed Mahran <[email protected]>
Date: 2016-06-23T00:06:30Z
Add a test case to capture the issue
commit 0615351f10133e5ddaca98cb4faa5b1b16e042f6
Author: Ahmed Mahran <[email protected]>
Date: 2016-06-23T00:07:32Z
Do a final cleaning of metadata when stopping streaming context gracefully
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]