mridulm commented on code in PR #37922:
URL: https://github.com/apache/spark/pull/37922#discussion_r990763769
##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -393,6 +393,22 @@ public void applicationRemoved(String appId, boolean
cleanupLocalDirs) {
}
}
+ @Override
+ public void removeShuffleMerge(String appId, int shuffleId, int
shuffleMergeId) {
+ AppShuffleInfo appShuffleInfo = validateAndGetAppShuffleInfo(appId);
+ AppShuffleMergePartitionsInfo partitionsInfo =
appShuffleInfo.shuffles.remove(shuffleId);
Review Comment:
I added `deleteMergedFiles` above since `closeAndDeleteOutdatedPartitions`
looses reference to existing files when finalized [1], as @wankunde has
observed in his current PR.
Why is this not an issue currently ?
a) We are not deleting files currently - `closeAndDeleteOutdatedPartitions`
is mainly trying to cleanup open fd's. It deletes files only if they were not
finalized (and so no one can read it - they are partial).
b) During application termination, we go and remove the parent directories
and delete files recursively - so nothing is left once we are done.
With this PR, we will need to change behavior to also delete older/existing
files even if finalized.
I have sketched an option above - from a deletion point of view.
Thoughts @otterc, @zhouyejoe, @wankunde ?
[1] Trying to keep track of the files, by preserving existing entry as in
this PR currently, does not work - it will be lost during restart of NM (not in
DB), will be expensive to maintain in long term in memory and this is available
on disk anyway - `getReducerIds()` above should suffice to recreate the full
list of files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]