Victsm commented on a change in pull request #33613:
URL: https://github.com/apache/spark/pull/33613#discussion_r681828503
##########
File path:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java
##########
@@ -105,16 +105,16 @@ public boolean shouldRetryError(Throwable t) {
return false;
}
- String errorStackTrace = Throwables.getStackTraceAsString(t);
// If the block is too late or stale block push, there is no need to
retry it
- return
!errorStackTrace.contains(TOO_LATE_OR_STALE_BLOCK_PUSH_MESSAGE_SUFFIX);
+ String msg = t.getMessage();
+ return !(msg != null &&
msg.contains(TOO_LATE_OR_STALE_BLOCK_PUSH_MESSAGE_SUFFIX));
Review comment:
I actually thought of the same during the performance evaluation.
We are currently using the exception as some sort of RPC message to deliver
these 2 potentially frequent messages, i.e. too late and block collision.
It incurs overhead on both server side and client side for initializing the
exception and examine the content of the exception, as well as this issue you
pointed out.
It is ideally much better to make these 2 a proper RPC response, e.g.
responding an actual payload with a byte instead of an empty ByteBuffer.
We can still keep the response in the case of a success push and merge of a
block as an empty ByteBuffer to reduce the cost of decoding ByteBuffer for
every success block push.
This however is a protocol change, and I'm not sure we will be able to
change it before the 3.2 RC date.
cc @gengliangwang
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]