[ https://issues.apache.org/jira/browse/SPARK-27219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-27219. ----------------------------------- Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/24160 . > Misleading exceptions in transport code's SASL fallback path > ------------------------------------------------------------ > > Key: SPARK-27219 > URL: https://issues.apache.org/jira/browse/SPARK-27219 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Priority: Minor > Fix For: 3.0.0 > > > There are a couple of code paths in the SASL fallback handling that result in > misleading exceptions printed to logs. One of them is if a timeout occurs > during authentication; for example: > {noformat} > 19/03/15 11:21:37 WARN crypto.AuthClientBootstrap: New auth protocol failed, > trying SASL. > java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout > waiting for task. > at > org.spark_project.guava.base.Throwables.propagate(Throwables.java:160) > at > org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:258) > at > org.apache.spark.network.crypto.AuthClientBootstrap.doSparkAuth(AuthClientBootstrap.java:105) > at > org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:79) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:262) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:192) > at > org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) > ... > Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task. > at > org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276) > at > org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) > at > org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:254) > ... 38 more > 19/03/15 11:21:38 WARN server.TransportChannelHandler: Exception in > connection from vc1033.halxg.cloudera.com/10.17.216.43:7337 > java.lang.IllegalArgumentException: Frame length should be positive: > -3702202170875367528 > at > org.spark_project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > {noformat} > The IllegalArgumentException shouldn't happen, it only happens because the > code is ignoring the time out and retrying, at which point the remote side is > in a different state and thus doesn't expect the message. > The same line that prints that exception can result in a noisy log message > when the remote side (e.g. an old shuffle service) does not understand the > new auth protocol. Since it's a warning it seems like something is wrong, > when it's just doing what's expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org