[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924045#comment-16924045 ] Sébastien BARNOUD commented on SPARK-21827: --- Didn't you have in your logs: {code:java} INFO sasl: DIGEST41:Unmatched MACs {code} -HADOOP-12483- only fix this issue that leads to "drop" messages in the protocol. > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at >
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923990#comment-16923990 ] Anatoly Vinogradov commented on SPARK-21827: I've got the same exception when run AWS EMR cluster with enabled encryption. Hadoop 2.8.5 have been used in those case. This version of Hadoop has a fix from HADOOP-12483. *Exception:* {code} org.apache.spark.shuffle.FetchFailedException: Frame length should be positive: -5952541650279226493 at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:549) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:480) . Caused by: java.lang.IllegalArgumentException: Frame length should be positive: -5952541650279226493 at org.spark_project.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:134) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:81) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) {code} *Environment:* {code} Java version: 1.8.0_201 (Oracle Corporation) Scala version: 2.11.8 Spark version: 2.3.2 {code} *Spark configuration parameters:* {code} spark.shuffle.service.enabled=true spark.ssl.enabled=true spark.network.crypto.saslFallback=true spark.network.crypto.keyLength=256 spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA256 spark.authenticate=true spark.network.crypto.enabled=true {code} > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at >
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904925#comment-16904925 ] Dongjoon Hyun commented on SPARK-21827: --- Thank you for sharing, [~Sebastien Barnoud]. I'll link that issue and close this one. > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850834#comment-16850834 ] Sébastien BARNOUD commented on SPARK-21827: --- https://issues.apache.org/jira/browse/HADOOP-12483?attachmentSortBy=fileName It seems your problem is not a Spark issue, but a Hadoop one. Could you check that HADOOP-12483 is integrated in your stack ? > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849757#comment-16849757 ] Sébastien BARNOUD commented on SPARK-21827: --- Hi, For HBase i found the exact reason for this issue: With Sasl we have a sequence number for each Sasl message. Message must be sent over the network in the sequence order.If not (which is currently the case for at least HBase) we get this log message from DigestMD5Base (a jdk class): "DIGEST41:Unmatched MACs" and the message will be ignored. In fact we should have a Sasl Exception ([https://github.com/openjdk/jdk/blob/master/src/java.security.sasl/share/classes/com/sun/security/sasl/digest/DigestMD5Base.java): {code} if (peerSeqNum != networkByteOrderToInt(seqNum, 0, 4)) { throw new SaslException("DIGEST-MD5: Out of order " + "sequencing of messages from server. Got: " + networkByteOrderToInt(seqNum, 0, 4) + " Expected: " + peerSeqNum); } {code} But we have the DIGEST41 instead because the sequence number is used for the MAC computation, and fails before the sequence number control. So, to summarize: -) A jdk issue that hides the real issue (MAC unmatch instead of bad sequence number) -) A bug at least in HBase and probably in some other Hadoop component that makes Sasl message not sent in the same order than the Sasl sequence number Regards, > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at >
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838421#comment-16838421 ] Sébastien BARNOUD commented on SPARK-21827: --- Hi, We already use for a while and with huge volume SASL with Kafka. I just have a look on Kafka implementation: [https://github.com/apache/kafka/blob/6ca899e56d451eef04e81b0f4d88bdb10f3cf4b3/clients/src/main/java/org/apache/kafka/common/network/Selector.java] The KafkaChannel (including the SaslClient) is managed by this class that is clearly documented as NOT thread safe. That is probably the reason why we never noticed issue with Kafka and SASL. > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838396#comment-16838396 ] Sébastien BARNOUD commented on SPARK-21827: --- Hi, I don't have the exact reason, but in a Spark job with 9 executor.cores and HBase Client 2.1.4: -) with hbase.client.ipc.pool.type=RoundRobin (the default), i get frequently the issue (*DIGEST41:Unmatched MACs*) -) with hbase.client.ipc.pool.type=ThreadLocal, i never get it ... Oracle confirm me that the class SaslClient is not documented as thread safe, and that the application should take care about it. Hoping this may help. > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at >
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829089#comment-16829089 ] Sébastien BARNOUD commented on SPARK-21827: --- Hi, I was investigating timeout on HBase Client (version 1.1.2) on my Hadoop cluster with security enabled using hotspot jdk 1.8.0_92-b14. I have found the following message in logs each time a get a timeout: *sasl:1481 - DIGEST41:Unmatched MACs* After a look at the code, if understand that the message is simply ignored if an invalid MAC is received. In my opinion, this is not a normal behavior. It allows at least an attacker to flood the connection. But, in my case, there is no men in the middle, but a get this message. It looks like there is bug (probably a not thread safe method somewhere) in the MAC validation, leading to the message to be ignored, and to my HBase timeout. In the same time, we have found some TEZ job stuck on our cluster since we have enabled security on shuffle (mapreduce, TEZ and Spark). In each hanged job, we could identify that the SSL handshake never finished: "fetcher \{Map_4} #34" #78 daemon prio=5 os_prio=0 tid=0x7fd86905d000 nid=0x13dad runnable [0x7fd83beb6000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) - locked <0x0007b997a470> (a java.lang.Object) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) - locked <0x0007b997a430> (a java.lang.Object) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:100) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:80) at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:672) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534) - locked <0x0007b9979f10> (a sun.net.www.protocol.https.DelegateHttpsURLConnection) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) - locked <0x0007b9979f10> (a sun.net.www.protocol.https.DelegateHttpsURLConnection) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) - locked <0x0007b9979ea8> (a sun.net.www.protocol.https.HttpsURLConnectionImpl) at org.apache.tez.runtime.library.common.shuffle.HttpConnection.getInputStream(HttpConnection.java:253) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:356) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:264) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191) Looking a TEZ source, shows that there are no timeout in the code leading to this infinite wait. After some more investigation, I found: -) https://issues.apache.org/jira/browse/SPARK-21827 -) [https://issues.cask.co/browse/CDAP-12737] -) [https://bugster.forgerock.org/jira/browse/OPENDJ-4956] It seems that this issue affects a lot of software, and ForgeRock seems to have identified the thread safety issue. To summarize, there are 2 issues: # the message shouldn’t be ignored when the MAC is invalid, an exception should be throwed. # The thread safety issue should be investigated and corrected in the JDK, because relying on a synchronized method at the application layer is not viable. Typically, an application like Spark uses multiple SASL implementation and can’t synchronize all of them. I sent this to [secalert...@oracle.com|mailto:secalert...@oracle.com] because IMO it's a JDK bug. Regards, Sébastien BARNOUD > Task fail due to executor exception when enable Sasl Encryption >
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280277#comment-16280277 ] Mario Molina commented on SPARK-21827: -- HDFS itself? I mean, don't you have a storage layer behind? I had the same problem and the workaround was to assign just one core to the executor :-( In the storage layer (which was behind and it wasn't HDFS) there was an issue with the protocol related to the concurrency. > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279779#comment-16279779 ] Yishan Jiang commented on SPARK-21827: -- Yes, I am using HDFS. Cores to executor, mostly using default. Try other number like 2, 3... etc, same issue. > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at
[jira] [Commented] (SPARK-21827) Task fail due to executor exception when enable Sasl Encryption
[ https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233587#comment-16233587 ] Mario Molina commented on SPARK-21827: -- Are you trying to read/write data to some DB or HDFS or something like that? If so, which one? How many core do you have assigned to each executor? > Task fail due to executor exception when enable Sasl Encryption > --- > > Key: SPARK-21827 > URL: https://issues.apache.org/jira/browse/SPARK-21827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.6.1, 2.1.1, 2.2.0 > Environment: OS: RedHat 7.1 64bit >Reporter: Yishan Jiang >Priority: Major > > We met authentication and Sasl encryption on many versions, just append 161 > version like this: > spark.local.dir /tmp/test-161 > spark.shuffle.service.enabled true > *spark.authenticate true* > *spark.authenticate.enableSaslEncryption true* > *spark.network.sasl.serverAlwaysEncrypt true* > spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f > spark.history.ui.port 18089 > spark.shuffle.service.port 7347 > spark.master.rest.port 6076 > spark.deploy.recoveryMode NONE > spark.ssl.enabled true > spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom > We run an Spark example and task fail with Exception messages: > 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347 > 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager > 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs > 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:785) > 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests > outstanding when connection from > cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed > 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = > RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, > cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn > dpointRef(null))] in 1 attempts > java.lang.IllegalArgumentException: Frame length should be positive: > -5594407078713290673 > at > org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119) > at > org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at >