otterc commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r642024816



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -290,16 +349,64 @@ void deleteExecutorDirs(Path[] dirs) {
     }
   }
 
+  /**
+   * Create StreamCallback for invalid push blocks with the specific error 
message.
+   * If specific error message is null, this StreamCallback won't throw 
exception in client.
+   */
+  private StreamCallbackWithID createCallbackForInvalidPushBlocks(
+      String streamId,
+      String errorMessage) {
+    return new StreamCallbackWithID() {
+      @Override
+      public String getID() {
+        return streamId;
+      }
+
+      @Override
+      public void onData(String streamId, ByteBuffer buf) {
+        // Ignore the requests. It reaches here either when a request is 
received after the
+        // shuffle file is finalized or when a request is for a duplicate 
block.
+      }
+
+      @Override
+      public void onComplete(String streamId) {
+        if (errorMessage != null) {
+          // Throw an exception here so the block data is drained from channel 
and server
+          // responds RpcFailure to the client.
+          throw new RuntimeException(String.format("Block %s %s", streamId, 
errorMessage));
+        }
+        // For duplicate block that is received before the shuffle merge 
finalizes, the
+        // server should respond success to the client.
+      }
+
+      @Override
+      public void onFailure(String streamId, Throwable cause) {
+      }
+    };
+  }
+
   @Override
   public StreamCallbackWithID receiveBlockDataAsStream(PushBlockStream msg) {
     // Retrieve merged shuffle file metadata
-    AppShuffleId appShuffleId = new AppShuffleId(msg.appId, msg.shuffleId);
+    AppAttemptPathsInfo appAttemptPathsInfo = 
getAppAttemptPathsInfo(msg.appId);
+    final String streamId = String.format("%s_%d_%d_%d",
+      OneForOneBlockPusher.SHUFFLE_PUSH_BLOCK_PREFIX, msg.shuffleId, 
msg.mapIndex, msg.reduceId);
+    AppAttemptShuffleId appAttemptShuffleId =
+      new AppAttemptShuffleId(msg.appId, msg.attemptId, msg.shuffleId);
+    if (appAttemptPathsInfo.attemptId != appAttemptShuffleId.attemptId) {
+      // If this Block belongs to a former application attempt, it is 
considered late,
+      // as only the blocks from the current application attempt will be merged
+      // TODO: [SPARK-35548] Client should be updated to handle this error.
+      return createCallbackForInvalidPushBlocks(streamId,

Review comment:
       Is there a need to create a valid `StreamCallback` in this case? Can we 
just not throw the RuntimeException with the 
`NEWER_ATTEMPT_HAS_STARTED_MESSAGE_SUFFIX` when it tries to create the stream? 
This will also make the refactoring  below which added 
`createCallbackForInvalidPushBlocks` unnecessary. 
   We do throw RuntimeException when `Cannot initialize merged shuffle 
partition for appId %s shuffleId %s"reduceId %s` from 
`getOrCreateAppShufflePartitionInfo`, so we can immediately fail to create the 
stream here as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to