akpatnam25 commented on code in PR #38959:
URL: https://github.com/apache/spark/pull/38959#discussion_r1069084937


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java:
##########
@@ -192,8 +197,12 @@ private synchronized void initiateRetry() {
   private synchronized boolean shouldRetry(Throwable e) {
     boolean isIOException = e instanceof IOException
       || e.getCause() instanceof IOException;
+    boolean isSaslTimeout = enableSaslRetries &&
+        (e instanceof TransportClient.SaslTimeoutException ||
+            (e.getCause() != null && e.getCause() instanceof 
TransportClient.SaslTimeoutException));
     boolean hasRemainingRetries = retryCount < maxRetries;
-    return isIOException && hasRemainingRetries && 
errorHandler.shouldRetryError(e);
+    return (isSaslTimeout || isIOException) &&
+        hasRemainingRetries && errorHandler.shouldRetryError(e);

Review Comment:
   I had to use a map to track the context in which we want to set the 
retryCount back to 0, since we only want to do it in the case that the 
exception for that blockId is a SaslTimeoutException. The map is only accessed 
and mutated within synchronized blocks, so this should allow us to ensure that 
we are only setting retryCount in the case of SaslTimeoutException retries, and 
not IOException retries. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to