[jira] [Comment Edited] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers

2019-08-15 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908714#comment-16908714
 ] 

angerszhu edited comment on SPARK-28726 at 8/16/19 6:03 AM:


[~hyukjin.kwon]

Just SparkthriftServer run sql with dynamic allocation. config like last reply.


was (Author: angerszhuuu):
[~hyukjin.kwon]

Just SparkthriftServer run sql with dynamic allocation. config like below.

> Spark with DynamicAllocation always got connect rest by peers
> -
>
> Key: SPARK-28726
> URL: https://issues.apache.org/jira/browse/SPARK-28726
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause 
> when Blockmanager call netty io, the executor has been remove because of 
> timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN 
> org.apache.spark.network.server.TransportChannelHandler: "Exception in 
> connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: 
> "Error trying to remove broadcast 67 from block manager BlockManagerId(967, 
> host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 
> 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed 
> to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers

2019-08-15 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908711#comment-16908711
 ] 

angerszhu edited comment on SPARK-28726 at 8/16/19 5:04 AM:


[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

spark.dynamicAllocation.enabled true
 spark.dynamicAllocation.executorIdleTimeout 5s
 spark.dynamicAllocation.initialExecutors 1
 spark.dynamicAllocation.maxExecutors 40
 spark.dynamicAllocation.schedulerBacklogTimeout 2s

 

spark.reducer.maxSizeInFlight 24m
spark.shuffle.consolidateFiles true
spark.core.connection.ack.wait.timeout 300s
spark.kryoserializer.buffer.max 1024m
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.port.maxRetries 100


was (Author: angerszhuuu):
[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 5s
spark.dynamicAllocation.initialExecutors 1
spark.dynamicAllocation.maxExecutors 40
spark.dynamicAllocation.schedulerBacklogTimeout 2s

> Spark with DynamicAllocation always got connect rest by peers
> -
>
> Key: SPARK-28726
> URL: https://issues.apache.org/jira/browse/SPARK-28726
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause 
> when Blockmanager call netty io, the executor has been remove because of 
> timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN 
> org.apache.spark.network.server.TransportChannelHandler: "Exception in 
> connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: 
> "Error trying to remove broadcast 67 from block manager BlockManagerId(967, 
> host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 
> 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Fai

[jira] [Comment Edited] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers

2019-08-15 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908711#comment-16908711
 ] 

angerszhu edited comment on SPARK-28726 at 8/16/19 5:00 AM:


[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 5s
spark.dynamicAllocation.initialExecutors 1
spark.dynamicAllocation.maxExecutors 40
spark.dynamicAllocation.schedulerBacklogTimeout 2s


was (Author: angerszhuuu):
[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

 

> Spark with DynamicAllocation always got connect rest by peers
> -
>
> Key: SPARK-28726
> URL: https://issues.apache.org/jira/browse/SPARK-28726
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause 
> when Blockmanager call netty io, the executor has been remove because of 
> timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN 
> org.apache.spark.network.server.TransportChannelHandler: "Exception in 
> connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: 
> "Error trying to remove broadcast 67 from block manager BlockManagerId(967, 
> host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 
> 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed 
> to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issu

[jira] [Comment Edited] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers

2019-08-15 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908711#comment-16908711
 ] 

angerszhu edited comment on SPARK-28726 at 8/16/19 4:59 AM:


[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

 


was (Author: angerszhuuu):
[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

> Spark with DynamicAllocation always got connect rest by peers
> -
>
> Key: SPARK-28726
> URL: https://issues.apache.org/jira/browse/SPARK-28726
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause 
> when Blockmanager call netty io, the executor has been remove because of 
> timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN 
> org.apache.spark.network.server.TransportChannelHandler: "Exception in 
> connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: 
> "Error trying to remove broadcast 67 from block manager BlockManagerId(967, 
> host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 
> 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed 
> to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org