[jira] [Closed] (SPARK-16146) Spark application failed by Yarn preempting
[ https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cong Feng closed SPARK-16146. - Resolution: Fixed > Spark application failed by Yarn preempting > --- > > Key: SPARK-16146 > URL: https://issues.apache.org/jira/browse/SPARK-16146 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 > Environment: Amazon EC2, centos 6.6, > Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, > preemption and dynamic allocation enabled. >Reporter: Cong Feng > > Hi, > We are setting up our Spark cluster on amz ec2. We are using Spark Yarn > client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official > web) and Hadoop 2.7.2. We also enable preemption, dynamic allocation and > spark.shuffle.service.enabled. > During our test we found our Spark application frequently get killed when the > preemption happened. Mostly seems driver trying to send rpc to executor which > has been preempted before, also there are some connect rest by peer > exceptions which also cause job failed Below are the typical exceptions we > found: > 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49 > java.io.IOException: Failed to send RPC 5721681506291542850 to > nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: > java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > And > 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122 > 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in > connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618 > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > 16/06/19 22:33:14 ERROR client.TransportResponseHandler: Still have 2 > requests outstanding when connection from > nodexx-xx-xx..ddns.xx
[jira] [Commented] (SPARK-16146) Spark application failed by Yarn preempting
[ https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347479#comment-15347479 ] Cong Feng commented on SPARK-16146: --- Finally it turned out to be the Zeppelin issue, we run the same job on Spark shell, we saw the exactly same exception but seems shell is able to handle to it and keep rest of tasks running to the end. While the zeppelin consider those exception as fatal and fail the job (at least at UI level). Thanks guys for all the help, for now I resolve the ticket. > Spark application failed by Yarn preempting > --- > > Key: SPARK-16146 > URL: https://issues.apache.org/jira/browse/SPARK-16146 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 > Environment: Amazon EC2, centos 6.6, > Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, > preemption and dynamic allocation enabled. >Reporter: Cong Feng > > Hi, > We are setting up our Spark cluster on amz ec2. We are using Spark Yarn > client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official > web) and Hadoop 2.7.2. We also enable preemption, dynamic allocation and > spark.shuffle.service.enabled. > During our test we found our Spark application frequently get killed when the > preemption happened. Mostly seems driver trying to send rpc to executor which > has been preempted before, also there are some connect rest by peer > exceptions which also cause job failed Below are the typical exceptions we > found: > 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49 > java.io.IOException: Failed to send RPC 5721681506291542850 to > nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: > java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > And > 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122 > 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in > connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618 > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEv
[jira] [Commented] (SPARK-16146) Spark application failed by Yarn preempting
[ https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345232#comment-15345232 ] Cong Feng commented on SPARK-16146: --- Hi Sean, In our test case, normally one job will get 10 executors and at one running turn 3 to 5 executors will get preempted. The exception does not happened every turn we run the job. But still pretty frequently. it seems every 15 to 20 preemption we will see that exception, which cause the job to fail. The driver will always be there, but next time we run the same job on that driver, it will fail again with same exception. Roll back to Spark 1.4.1 seems to solve this, at least 120 preemptions happened we never see one job failed. so its bit wired for Spark 1.6.1 but it did happen. We are thinking probably its the netty issue still not sure. > Spark application failed by Yarn preempting > --- > > Key: SPARK-16146 > URL: https://issues.apache.org/jira/browse/SPARK-16146 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 > Environment: Amazon EC2, centos 6.6, > Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, > preemption and dynamic allocation enabled. >Reporter: Cong Feng >Priority: Critical > > Hi, > We are setting up our Spark cluster on amz ec2. We are using Spark Yarn > client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official > web) and Hadoop 2.7.2. We also enable preemption, dynamic allocation and > spark.shuffle.service.enabled. > During our test we found our Spark application frequently get killed when the > preemption happened. Mostly seems driver trying to send rpc to executor which > has been preempted before, also there are some connect rest by peer > exceptions which also cause job failed Below are the typical exceptions we > found: > 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49 > java.io.IOException: Failed to send RPC 5721681506291542850 to > nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: > java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > And > 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122 > 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in > connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618 > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) > at > io.netty.channel.nio.AbstractNi
[jira] [Updated] (SPARK-16146) Spark application failed by Yarn preempting
[ https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cong Feng updated SPARK-16146: -- Description: Hi, We are setting up our Spark cluster on amz ec2. We are using Spark Yarn client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official web) and Hadoop 2.7.2. We also enable preemption, dynamic allocation and spark.shuffle.service.enabled. During our test we found our Spark application frequently get killed when the preemption happened. Mostly seems driver trying to send rpc to executor which has been preempted before, also there are some connect rest by peer exceptions which also cause job failed Below are the typical exceptions we found: 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49 java.io.IOException: Failed to send RPC 5721681506291542850 to nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException And 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) 16/06/19 22:33:14 ERROR client.TransportResponseHandler: Still have 2 requests outstanding when connection from nodexx-xx-xx..ddns.xx.com/xx.xx.xx.xx:56618 is closed. It happens both to capacity scheduler and fair scheduler. The wired thing is when we rolled back to Spark 1.4.1, this issue magically disappeared and we can do the preemption smoothly. But we still wants to deploy with Spark 1.6.1. Is this a bug or something we can fixed. Any ideas will be great helpful to us. Thanks was: Hi, We are setting up our Spark cluster on amz ec2. We are using Spark Yarn client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official web) and Hadoop 2.7.2. We also enable preemption and dynamic allocation. During our test we found our Spark application fre
[jira] [Created] (SPARK-16146) Spark application failed by Yarn preempting
Cong Feng created SPARK-16146: - Summary: Spark application failed by Yarn preempting Key: SPARK-16146 URL: https://issues.apache.org/jira/browse/SPARK-16146 Project: Spark Issue Type: Bug Affects Versions: 1.6.1 Environment: Amazon EC2, centos 6.6, Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, preemption and dynamic allocation enabled. Reporter: Cong Feng Priority: Critical Hi, We are setting up our Spark cluster on amz ec2. We are using Spark Yarn client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official web) and Hadoop 2.7.2. We also enable preemption and dynamic allocation. During our test we found our Spark application frequently get killed when the preemption happened. Mostly seems driver trying to send rpc to executor which has been preempted before, also there are some connect rest by peer exceptions which also cause job failed Below are the typical exceptions we found: 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49 java.io.IOException: Failed to send RPC 5721681506291542850 to nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException And 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) 16/06/19 22:33:14 ERROR client.TransportResponseHandler: Still have 2 requests outstanding when connection from nodexx-xx-xx..ddns.xx.com/xx.xx.xx.xx:56618 is closed. It happens both to capacity scheduler and fair scheduler. The wired thing is when we rolled back to Spark 1.4.1, this issue magically disappeared and we can do the preemption smoothly. But we still wants to deploy with Spark 1.6.1. Is this a bug or something we can fixed. Any ideas will be great he