mridulm commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1067516221
########## common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java: ########## @@ -338,6 +342,15 @@ public String toString() { .toString(); } + /** + * Exception thrown when sasl request times out. + */ + public static class SaslTimeoutException extends RuntimeException { + public SaslTimeoutException(Throwable cause) { + super((cause)); + } + } Review Comment: Remove this ? ########## common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java: ########## @@ -294,6 +294,15 @@ public void onFailure(Throwable e) { } } + /** + * Exception thrown when sasl request times out. + */ + public static class SaslTimeoutException extends RuntimeException { + public SaslTimeoutException(Throwable cause) { + super((cause)); + } + } + Review Comment: Move this to ` org.apache.spark.network.sasl` package. Also add the common constructors for an Exception - `SaslTimeoutException()`, `SaslTimeoutException(String)`, `SaslTimeoutException(String, Throwable)`, in addition to what we have here. ########## common/network-common/src/main/java/org/apache/spark/network/sasl/SaslClientBootstrap.java: ########## @@ -65,9 +67,18 @@ public void doBootstrap(TransportClient client, Channel channel) { SaslMessage msg = new SaslMessage(appId, payload); ByteBuf buf = Unpooled.buffer(msg.encodedLength() + (int) msg.body().size()); msg.encode(buf); + ByteBuffer response; buf.writeBytes(msg.body().nioByteBuffer()); - - ByteBuffer response = client.sendRpcSync(buf.nioBuffer(), conf.authRTTimeoutMs()); + try { + response = client.sendRpcSync(buf.nioBuffer(), conf.authRTTimeoutMs()); + } catch (RuntimeException ex) { + // We know it is a Sasl timeout here if it is a TimeoutException. + if (ex.getCause() instanceof TimeoutException) { + throw Throwables.propagate(new TransportClient.SaslTimeoutException(ex.getCause())); Review Comment: Why `Throwables.propagate` ? Simply create and throw the new `SaslTimeoutException` directly ? ########## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java: ########## @@ -192,8 +197,12 @@ private synchronized void initiateRetry() { private synchronized boolean shouldRetry(Throwable e) { boolean isIOException = e instanceof IOException || e.getCause() instanceof IOException; + boolean isSaslTimeout = enableSaslRetries && + (e instanceof TransportClient.SaslTimeoutException || + (e.getCause() != null && e.getCause() instanceof TransportClient.SaslTimeoutException)); boolean hasRemainingRetries = retryCount < maxRetries; - return isIOException && hasRemainingRetries && errorHandler.shouldRetryError(e); + return (isSaslTimeout || isIOException) && + hasRemainingRetries && errorHandler.shouldRetryError(e); Review Comment: If we had sasl failures which triggered a retry - and we finally succeeded - we should reset `retryCount` to `0`. sasl is part of bootstrap, unlike other failures we retry for. ########## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java: ########## @@ -192,8 +197,12 @@ private synchronized void initiateRetry() { private synchronized boolean shouldRetry(Throwable e) { boolean isIOException = e instanceof IOException || e.getCause() instanceof IOException; + boolean isSaslTimeout = enableSaslRetries && + (e instanceof TransportClient.SaslTimeoutException || + (e.getCause() != null && e.getCause() instanceof TransportClient.SaslTimeoutException)); Review Comment: With the proposed changes, this will simply become: ```suggestion (e instanceof TransportClient.SaslTimeoutException); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org