mridulm commented on code in PR #37922:
URL: https://github.com/apache/spark/pull/37922#discussion_r990763769


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -393,6 +393,22 @@ public void applicationRemoved(String appId, boolean 
cleanupLocalDirs) {
     }
   }
 
+  @Override
+  public void removeShuffleMerge(String appId, int shuffleId, int 
shuffleMergeId) {
+    AppShuffleInfo appShuffleInfo = validateAndGetAppShuffleInfo(appId);
+    AppShuffleMergePartitionsInfo partitionsInfo = 
appShuffleInfo.shuffles.remove(shuffleId);

Review Comment:
   I added `deleteMergedFiles` above since `closeAndDeleteOutdatedPartitions` 
looses reference to existing files, as @wankunde has observed in his current PR.
   
   Why is this not an issue currently ?
   
   a) We are not deleting files currently - `closeAndDeleteOutdatedPartitions` 
is mainly trying to cleanup open fd's. It deletes files only if they were not 
finalized (and so no one can read it - they are partial).
   b) During application termination, we go and remove the parent directories 
and delete files recursively - so nothing is left once we are done.
   
   
   With this PR, we will need to change behavior to also delete older/existing 
files even if finalized.
   I have sketched an option above - from a deletion point of view.
   
   Thoughts @otterc, @zhouyejoe, @wankunde ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to