otterc commented on a change in pull request #33613:
URL: https://github.com/apache/spark/pull/33613#discussion_r683558307



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -385,14 +397,24 @@ public StreamCallbackWithID 
receiveBlockDataAsStream(PushBlockStream msg) {
           + "with the current attempt id %s stored in shuffle service for 
application %s",
           msg.appAttemptId, appShuffleInfo.attemptId, msg.appId));
     }
+    // Use string concatenation here to avoid the overhead with String.format 
on every
+    // pushed block.
+    final String streamId = OneForOneBlockPusher.SHUFFLE_PUSH_BLOCK_PREFIX + 
"_"
+      + msg.shuffleId + "_" + msg.shuffleMergeId + "_" + msg.mapIndex + "_" + 
msg.reduceId;
     // Retrieve merged shuffle file metadata
     AppShufflePartitionInfo partitionInfoBeforeCheck;
+    boolean isStaleBlock = false;
+    boolean isTooLate = false;
     try {
       partitionInfoBeforeCheck = 
getOrCreateAppShufflePartitionInfo(appShuffleInfo, msg.shuffleId,
         msg.shuffleMergeId, msg.reduceId);
-    } catch(StaleBlockPushException sbp) {
+      isTooLate = partitionInfoBeforeCheck == null;
+    } catch(BlockPushNonFatalFailure bpf) {

Review comment:
       Referring to the previous conversation
   https://github.com/apache/spark/pull/33613#discussion_r682952969
   I checked other places in `getOrCreateAppShufflePartitionInfo` where we are 
throwing `RuntimeException` which will just fail to return a `StreamCallback` 
and thus closing the connection on the server.
   ```
         throw new RuntimeException(
             String.format("Cannot initialize merged shuffle partition for 
appId %s shuffleId %s "
               + "shuffleMergeId %s reduceId %s", appShuffleInfo.appId, 
shuffleId, shuffleMergeId,
                 reduceId), e);
   ```
   This is thrown when the server fails to create a merged data/index/meta file 
for a partition. It may be due to a temporary glitch on the server. So, in this 
case do we really want to close the connection?  The client could be sending 
blocks of a different partition to this server as well and using the connection 
for that?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to