mridulm commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r895282920


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -350,15 +415,27 @@ void closeAndDeletePartitionFilesIfNeeded(
    * up older shuffleMergeId partitions. The cleanup will be executed in a 
separate thread.
    */
   @VisibleForTesting
-  void closeAndDeletePartitionFiles(Map<Integer, AppShufflePartitionInfo> 
partitions) {
+  void closeAndDeleteOutdatedPartitions(Map<Integer, AppShufflePartitionInfo> 
partitions) {
     partitions
       .forEach((partitionId, partitionInfo) -> {
         synchronized (partitionInfo) {
           partitionInfo.closeAllFilesAndDeleteIfNeeded(true);
+          
removeAppShufflePartitionInfoFromDB(partitionInfo.appAttemptShuffleMergeId);

Review Comment:
   Same as with comment in `removeAppShuffleInfoFromDB`. Does this need to be 
within sync block ?
   
   Also, we would have a large number of partitionInfo for a given shuffle 
merge id - we are repeatedly trying to delete the same shuffle merge id here.
   
   Assuming we dont need the delete to be in sync block, we can pull all shuffe 
merge id's in `partitions` and delete them.
   
   For example:
   
   ```
   Set<AppAttemptShuffleMergeId> mergeIdsToRemove = new HashSet<>();
   
   partitions.forEach ... {
     mergeIdsToRemove.add(partitionInfo.appAttemptShuffleMergeId);
     synchronized (partitionInfo) {
       partitionInfo.closeAllFilesAndDeleteIfNeeded(true)
     }
   }
   
   mergeIdsToRemove.forEach(this::removeAppShufflePartitionInfoFromDB);
   ```
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to