GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/22341
[SPARK-24889][Core] Update block info when unpersist rdds
## What changes were proposed in this pull request?
We will update block info coming from executors, at the timing like caching
a RDD. However, when removing RDDs with unpersisting, we don't ask to update
block info. So the block info is not updated.
We can fix this with few options:
1. Ask to update block info when unpersisting
This is simplest but changes driver-executor communication a bit.
2. Update block info when processing the event of unpersisting RDD
We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When
processing this event, we can update block info of the RDD. This only changes
event processing code so the risk seems to be lowest.
Currently this patch takes option 2 for lowest risk. If we agree first
option has no risk, we can change to it.
## How was this patch tested?
Unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-24889
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22341.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22341
----
commit dd5f766e0f270cfc58ca4298c39179469f021f78
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-08-30T23:17:46Z
Update memory and disk info when unpersist rdds.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]