Victsm commented on a change in pull request #33613:
URL: https://github.com/apache/spark/pull/33613#discussion_r681840796



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -379,9 +399,10 @@ void deleteExecutorDirs(AppShuffleInfo appShuffleInfo) {
   @Override
   public StreamCallbackWithID receiveBlockDataAsStream(PushBlockStream msg) {
     AppShuffleInfo appShuffleInfo = validateAndGetAppShuffleInfo(msg.appId);
-    final String streamId = String.format("%s_%d_%d_%d_%d",
-      OneForOneBlockPusher.SHUFFLE_PUSH_BLOCK_PREFIX, msg.shuffleId, 
msg.shuffleMergeId,
-      msg.mapIndex, msg.reduceId);
+    // Use string concatenation here to avoid the overhead with String.format 
on every

Review comment:
       This is dependent on the number of blocks pushed, since this is incurred 
with every block pushed.
   The server has no problem to keep up in our benchmark which pushed 40M 
blocks, but from flamegraph we could see a noticeable CPU time spent in this 
String.format (~5%).
   For push-based shuffle, it would always be good to reduce the server side 
load.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to