mridulm commented on code in PR #38959:
URL: https://github.com/apache/spark/pull/38959#discussion_r1067516221


##########
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java:
##########
@@ -338,6 +342,15 @@ public String toString() {
       .toString();
   }
 
+  /**
+   * Exception thrown when sasl request times out.
+   */
+  public static class SaslTimeoutException extends RuntimeException {
+    public SaslTimeoutException(Throwable cause) {
+      super((cause));
+    }
+  }

Review Comment:
   Remove this ?



##########
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java:
##########
@@ -294,6 +294,15 @@ public void onFailure(Throwable e) {
     }
   }
 
+  /**
+   * Exception thrown when sasl request times out.
+   */
+  public static class SaslTimeoutException extends RuntimeException {
+    public SaslTimeoutException(Throwable cause) {
+      super((cause));
+    }
+  }
+

Review Comment:
   Move this to ` org.apache.spark.network.sasl` package.
   Also add the common constructors for an Exception - 
`SaslTimeoutException()`, `SaslTimeoutException(String)`, 
`SaslTimeoutException(String, Throwable)`, in addition to what we have here.



##########
common/network-common/src/main/java/org/apache/spark/network/sasl/SaslClientBootstrap.java:
##########
@@ -65,9 +67,18 @@ public void doBootstrap(TransportClient client, Channel 
channel) {
         SaslMessage msg = new SaslMessage(appId, payload);
         ByteBuf buf = Unpooled.buffer(msg.encodedLength() + (int) 
msg.body().size());
         msg.encode(buf);
+        ByteBuffer response;
         buf.writeBytes(msg.body().nioByteBuffer());
-
-        ByteBuffer response = client.sendRpcSync(buf.nioBuffer(), 
conf.authRTTimeoutMs());
+        try {
+          response = client.sendRpcSync(buf.nioBuffer(), 
conf.authRTTimeoutMs());
+        } catch (RuntimeException ex) {
+          // We know it is a Sasl timeout here if it is a TimeoutException.
+          if (ex.getCause() instanceof TimeoutException) {
+            throw Throwables.propagate(new 
TransportClient.SaslTimeoutException(ex.getCause()));

Review Comment:
   Why `Throwables.propagate` ? Simply create and throw the new 
`SaslTimeoutException` directly ?



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java:
##########
@@ -192,8 +197,12 @@ private synchronized void initiateRetry() {
   private synchronized boolean shouldRetry(Throwable e) {
     boolean isIOException = e instanceof IOException
       || e.getCause() instanceof IOException;
+    boolean isSaslTimeout = enableSaslRetries &&
+        (e instanceof TransportClient.SaslTimeoutException ||
+            (e.getCause() != null && e.getCause() instanceof 
TransportClient.SaslTimeoutException));
     boolean hasRemainingRetries = retryCount < maxRetries;
-    return isIOException && hasRemainingRetries && 
errorHandler.shouldRetryError(e);
+    return (isSaslTimeout || isIOException) &&
+        hasRemainingRetries && errorHandler.shouldRetryError(e);

Review Comment:
   If we had sasl failures which triggered a retry - and we finally succeeded - 
we should reset `retryCount` to `0`.
   sasl is part of bootstrap, unlike other failures we retry for.



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java:
##########
@@ -192,8 +197,12 @@ private synchronized void initiateRetry() {
   private synchronized boolean shouldRetry(Throwable e) {
     boolean isIOException = e instanceof IOException
       || e.getCause() instanceof IOException;
+    boolean isSaslTimeout = enableSaslRetries &&
+        (e instanceof TransportClient.SaslTimeoutException ||
+            (e.getCause() != null && e.getCause() instanceof 
TransportClient.SaslTimeoutException));

Review Comment:
   With the proposed changes, this will simply become:
   ```suggestion
           (e instanceof TransportClient.SaslTimeoutException);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to