mridulm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r680617877



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -410,17 +500,42 @@ public MergeStatuses 
finalizeShuffleMerge(FinalizeShuffleMerge msg) throws IOExc
           + "with the current attempt id %s stored in shuffle service for 
application %s",
           msg.appAttemptId, appShuffleInfo.attemptId, msg.appId));
     }
-    Map<Integer, AppShufflePartitionInfo> shufflePartitions =
-      appShuffleInfo.partitions.remove(msg.shuffleId);
+    AtomicReference<Map<Integer, AppShufflePartitionInfo>> 
shuffleMergePartitionsRef =
+      new AtomicReference<>(null);
+    // Metadata of the determinate stage shuffle can be safely removed as part 
of finalizing
+    // shuffle merge. Currently once the shuffle is finalized for a 
determinate stages, retry
+    // stages of the same shuffle will have shuffle push disabled.
+    if (msg.shuffleMergeId == DETERMINATE_SHUFFLE_MERGE_ID) {
+      AppShuffleMergePartitionsInfo appShuffleMergePartitionsInfo =
+        appShuffleInfo.shuffles.remove(msg.shuffleId);
+      if (appShuffleMergePartitionsInfo != null) {
+        
shuffleMergePartitionsRef.set(appShuffleMergePartitionsInfo.shuffleMergePartitions);
+      }
+    } else {
+      appShuffleInfo.shuffles.compute(msg.shuffleId, (id, value) -> {
+        if (null == value || msg.shuffleMergeId != value.shuffleMergeId ||

Review comment:
       Actually, thinking more - you are right: this can happen if there were 
no blocks pushed for the second shuffle, but only for first (and first was 
never finalized).
   @venkata91 Can you please make this fix, and add a test for it ? Thx.
   
   In the else block, add to `shuffleMergePartitionsRef` only if the merge id 
is an exact match (if msg.shuffleMergeId > value.shuffleMergeId, then no blocks 
were pushed for this shuffle - and this is an older shuffle's blocks : so 
ignore from finalization point of view - but cleanup).
   
   Thoughts ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to