Victsm commented on a change in pull request #33613:
URL: https://github.com/apache/spark/pull/33613#discussion_r681840796
##########
File path:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -379,9 +399,10 @@ void deleteExecutorDirs(AppShuffleInfo appShuffleInfo) {
@Override
public StreamCallbackWithID receiveBlockDataAsStream(PushBlockStream msg) {
AppShuffleInfo appShuffleInfo = validateAndGetAppShuffleInfo(msg.appId);
- final String streamId = String.format("%s_%d_%d_%d_%d",
- OneForOneBlockPusher.SHUFFLE_PUSH_BLOCK_PREFIX, msg.shuffleId,
msg.shuffleMergeId,
- msg.mapIndex, msg.reduceId);
+ // Use string concatenation here to avoid the overhead with String.format
on every
Review comment:
This is dependent on the number of blocks pushed, since this is incurred
with every block pushed.
The server has no problem to keep up in our benchmark which pushed 40M
blocks, but from flamegraph we could see a noticeable CPU time spent in this
String.format (~5%).
For push-based shuffle, it would always be good to reduce the server side
load.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]