from:"Cong Feng \(JIRA\)"

[jira] [Closed] (SPARK-16146) Spark application failed by Yarn preempting

2016-06-23 Thread Cong Feng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cong Feng closed SPARK-16146.
-
Resolution: Fixed

> Spark application failed by Yarn preempting
> ---
>
> Key: SPARK-16146
> URL: https://issues.apache.org/jira/browse/SPARK-16146
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
> Environment: Amazon EC2, centos 6.6,
> Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, 
> preemption and dynamic allocation enabled.
>Reporter: Cong Feng
>
> Hi,
> We are setting up our Spark cluster on amz ec2. We are using Spark Yarn 
> client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official 
> web) and Hadoop 2.7.2. We also enable preemption, dynamic allocation and 
> spark.shuffle.service.enabled.
> During our test we found our Spark application frequently get killed when the 
> preemption happened. Mostly seems driver trying to send rpc to executor which 
> has been preempted before, also there are some connect rest by peer 
> exceptions which also cause job failed Below are the typical exceptions we 
> found:
> 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49
> java.io.IOException: Failed to send RPC 5721681506291542850 to 
> nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: 
> java.nio.channels.ClosedChannelException
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> And 
> 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122
> 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in 
> connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> at 
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> 16/06/19 22:33:14 ERROR client.TransportResponseHandler: Still have 2 
> requests outstanding when connection from 
> nodexx-xx-xx..ddns.xx

[jira] [Commented] (SPARK-16146) Spark application failed by Yarn preempting

2016-06-23 Thread Cong Feng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347479#comment-15347479
 ] 

Cong Feng commented on SPARK-16146:
---

Finally it turned out to be the Zeppelin issue, we run the same job on Spark 
shell, we saw the exactly same exception but seems shell is able to handle to 
it and keep rest of tasks running to the end. While the zeppelin consider those 
exception as fatal and fail the job (at least at UI level). Thanks guys for all 
the help, for now I resolve the ticket.

> Spark application failed by Yarn preempting
> ---
>
> Key: SPARK-16146
> URL: https://issues.apache.org/jira/browse/SPARK-16146
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
> Environment: Amazon EC2, centos 6.6,
> Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, 
> preemption and dynamic allocation enabled.
>Reporter: Cong Feng
>
> Hi,
> We are setting up our Spark cluster on amz ec2. We are using Spark Yarn 
> client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official 
> web) and Hadoop 2.7.2. We also enable preemption, dynamic allocation and 
> spark.shuffle.service.enabled.
> During our test we found our Spark application frequently get killed when the 
> preemption happened. Mostly seems driver trying to send rpc to executor which 
> has been preempted before, also there are some connect rest by peer 
> exceptions which also cause job failed Below are the typical exceptions we 
> found:
> 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49
> java.io.IOException: Failed to send RPC 5721681506291542850 to 
> nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: 
> java.nio.channels.ClosedChannelException
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> And 
> 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122
> 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in 
> connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> at 
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEv

[jira] [Commented] (SPARK-16146) Spark application failed by Yarn preempting

2016-06-22 Thread Cong Feng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345232#comment-15345232
 ] 

Cong Feng commented on SPARK-16146:
---

Hi Sean,

In our test case, normally one job will get 10 executors and at one running 
turn 3 to 5 executors will get preempted. The exception does not happened every 
turn we run the job. But still pretty frequently. it seems every 15 to 20 
preemption we will see that exception, which cause the job to fail. The driver 
will always be there, but next time we run the same job on that driver, it will 
fail again with same exception. Roll back to Spark 1.4.1 seems to solve this, 
at least 120 preemptions happened we never see one job failed. so its bit wired 
for Spark 1.6.1 but it did happen. We are thinking probably its the netty issue 
still not sure.

> Spark application failed by Yarn preempting
> ---
>
> Key: SPARK-16146
> URL: https://issues.apache.org/jira/browse/SPARK-16146
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
> Environment: Amazon EC2, centos 6.6,
> Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, 
> preemption and dynamic allocation enabled.
>Reporter: Cong Feng
>Priority: Critical
>
> Hi,
> We are setting up our Spark cluster on amz ec2. We are using Spark Yarn 
> client mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official 
> web) and Hadoop 2.7.2. We also enable preemption, dynamic allocation and 
> spark.shuffle.service.enabled.
> During our test we found our Spark application frequently get killed when the 
> preemption happened. Mostly seems driver trying to send rpc to executor which 
> has been preempted before, also there are some connect rest by peer 
> exceptions which also cause job failed Below are the typical exceptions we 
> found:
> 16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49
> java.io.IOException: Failed to send RPC 5721681506291542850 to 
> nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: 
> java.nio.channels.ClosedChannelException
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> And 
> 16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122
> 16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in 
> connection from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> at 
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at 
> io.netty.channel.nio.AbstractNi

[jira] [Updated] (SPARK-16146) Spark application failed by Yarn preempting

2016-06-22 Thread Cong Feng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cong Feng updated SPARK-16146:
--
Description: 
Hi,

We are setting up our Spark cluster on amz ec2. We are using Spark Yarn client 
mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official web) and 
Hadoop 2.7.2. We also enable preemption, dynamic allocation and 
spark.shuffle.service.enabled.

During our test we found our Spark application frequently get killed when the 
preemption happened. Mostly seems driver trying to send rpc to executor which 
has been preempted before, also there are some connect rest by peer exceptions 
which also cause job failed Below are the typical exceptions we found:

16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49
java.io.IOException: Failed to send RPC 5721681506291542850 to 
nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: 
java.nio.channels.ClosedChannelException
at 
org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
at 
org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
at 
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
at 
io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
at 
io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException

And 

16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122
16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in connection 
from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
16/06/19 22:33:14 ERROR client.TransportResponseHandler: Still have 2 requests 
outstanding when connection from 
nodexx-xx-xx..ddns.xx.com/xx.xx.xx.xx:56618 is closed.

It happens both to capacity scheduler and fair scheduler. The wired thing is 
when we rolled back to Spark 1.4.1, this issue magically disappeared and we can 
do the preemption smoothly.

But we still wants to deploy with Spark 1.6.1. Is this a bug or something we 
can fixed. Any ideas will be great helpful to us.

Thanks

  was:
Hi,

We are setting up our Spark cluster on amz ec2. We are using Spark Yarn client 
mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official web) and 
Hadoop 2.7.2. We also enable preemption and dynamic allocation. 

During our test we found our Spark application fre

[jira] [Created] (SPARK-16146) Spark application failed by Yarn preempting

2016-06-22 Thread Cong Feng (JIRA)

Cong Feng created SPARK-16146:
-

 Summary: Spark application failed by Yarn preempting
 Key: SPARK-16146
 URL: https://issues.apache.org/jira/browse/SPARK-16146
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.1
 Environment: Amazon EC2, centos 6.6,
Spark-1.6.1-bin-hadoop-2.6(binary from spark official web), Hadoop 2.7.2, 
preemption and dynamic allocation enabled.
Reporter: Cong Feng
Priority: Critical


Hi,

We are setting up our Spark cluster on amz ec2. We are using Spark Yarn client 
mode, which is Spark-1.6.1-bin-hadoop-2.6(binary from spark official web) and 
Hadoop 2.7.2. We also enable preemption and dynamic allocation. 

During our test we found our Spark application frequently get killed when the 
preemption happened. Mostly seems driver trying to send rpc to executor which 
has been preempted before, also there are some connect rest by peer exceptions 
which also cause job failed Below are the typical exceptions we found:

16/06/22 08:13:30 ERROR spark.ContextCleaner: Error cleaning RDD 49
java.io.IOException: Failed to send RPC 5721681506291542850 to 
nodexx.xx..ddns.xx.com/xx.xx.xx.xx:42857: 
java.nio.channels.ClosedChannelException
at 
org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
at 
org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
at 
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
at 
io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
at 
io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
at 
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException

And 

16/06/19 22:33:14 INFO storage.BlockManager: Removing RDD 122
16/06/19 22:33:14 WARN server.TransportChannelHandler: Exception in connection 
from nodexx-xx-xx.xx.ddns.xx.com/xx.xx.xx.xx:56618
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
16/06/19 22:33:14 ERROR client.TransportResponseHandler: Still have 2 requests 
outstanding when connection from 
nodexx-xx-xx..ddns.xx.com/xx.xx.xx.xx:56618 is closed.

It happens both to capacity scheduler and fair scheduler. The wired thing is 
when we rolled back to Spark 1.4.1, this issue magically disappeared and we can 
do the preemption smoothly.

But we still wants to deploy with Spark 1.6.1. Is this a bug or something we 
can fixed. Any ideas will be great he

[jira] [Closed] (SPARK-16146) Spark application failed by Yarn preempting

[jira] [Commented] (SPARK-16146) Spark application failed by Yarn preempting

[jira] [Commented] (SPARK-16146) Spark application failed by Yarn preempting

[jira] [Updated] (SPARK-16146) Spark application failed by Yarn preempting

[jira] [Created] (SPARK-16146) Spark application failed by Yarn preempting

5 matches

Site Navigation

Mail list logo

Footer information