Attached is the error I get from ignitevisorcmd.sh after calling the cache command (the command just hangs). To me it looks like all the spark executrors (10 in my test) start a new client node, and some of those nodes get terminated and restarted as the executor die. This seems to really confuse Ignite.
[15:45:10,741][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:40984, rmtAddr=/127.0.0.1:47101] [15:45:10,741][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:49872, rmtAddr=/127.0.0.1:47100] [15:45:10,742][INFO][grid-nio-worker-tcp-comm-3-#26%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:40988, rmtAddr=/127.0.0.1:47101] [15:45:10,743][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:40992] [15:45:10,745][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:49876, rmtAddr=/127.0.0.1:47100] [15:45:11,725][SEVERE][grid-nio-worker-tcp-comm-2-#25%console%][TcpCommunicationSpi] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2, bytesRcvd=180, bytesSent=18, bytesRcvd0=18, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=console, finished=false, hashCode=1827979135, interrupted=false, runner=grid-nio-worker-tcp-comm-2-#25%console%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=166400 cap=166400], readBuf=java.nio.DirectByteBuffer[pos=18 lim=18 cap=117948], inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/ 172.21.85.37:39942, rmtAddr=ip-172-21-85-213.ap-south-1.compute.internal/ 172.21.85.213:47100, createTime=1535125510724, closeTime=0, bytesSent=0, bytesRcvd=18, bytesSent0=0, bytesRcvd0=18, sndSchedTime=1535125510724, lastSndTime=1535125510724, lastRcvTime=1535125510724, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@7ae6182a, directMode=true], GridConnectionBytesVerifyFilter], accepted=false]]] java.lang.NullPointerException at org.apache.ignite.internal.util.nio.GridNioServer.cancelConnect(GridNioServer.java:885) at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.cancel(TcpCommunicationConnectionCheckFuture.java:338) at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture.cancelFutures(TcpCommunicationConnectionCheckFuture.java:475) at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture.receivedAddressStatus(TcpCommunicationConnectionCheckFuture.java:494) at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture$1.onStatusReceived(TcpCommunicationConnectionCheckFuture.java:433) at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.finish(TcpCommunicationConnectionCheckFuture.java:362) at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.onConnected(TcpCommunicationConnectionCheckFuture.java:348) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onMessage(TcpCommunicationSpi.java:773) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onMessage(TcpCommunicationSpi.java:383) at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:117) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) at org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onMessageReceived(GridConnectionBytesVerifyFilter.java:88) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3490) On Fri, Aug 24, 2018 at 11:18 AM, eugene miretsky <eugene.miret...@gmail.com > wrote: > Thanks, > > So the way I understand it, thick client will use the affinitly key to > send data to the right node, and hence will split the traiffic between all > the nodes, the thin client will just send the data to one node, and that > node will be responsible to send it to the actual node that owns the > 'shard'? > > I keep getting the following error when using the Spark driver, the driver > keeps writing, but very slowly. Any idea what is causing the error, or how > to fix it? > > Cheers, > Eugene > > " > > [15:04:58,030][SEVERE][data-streamer-stripe-10-#43%Server%][DataStreamProcessor] > Failed to respond to node [nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf, > res=DataStreamerResponse [reqId=192, forceLocDep=true]] > > class org.apache.ignite.IgniteCheckedException: Failed to send message > (node may have left the grid or TCP connection cannot be established due to > firewall issues) [node=ZookeeperClusterNode > [id=78af5d88-cbfa-4529-aaee-ff4982985cdf, > addrs=[127.0.0.1], order=377, loc=false, client=true], topic=T1 > [topic=TOPIC_DATASTREAM, id=b8d675c6561-78af5d88-cbfa-4529-aaee-ff4982985cdf], > msg=DataStreamerResponse [reqId=192, forceLocDep=true], policy=9] > > at org.apache.ignite.internal.managers.communication. > GridIoManager.send(GridIoManager.java:1651) > > at org.apache.ignite.internal.managers.communication. > GridIoManager.sendToCustomTopic(GridIoManager.java:1703) > > at org.apache.ignite.internal.managers.communication. > GridIoManager.sendToCustomTopic(GridIoManager.java:1673) > > at org.apache.ignite.internal.processors.datastreamer. > DataStreamProcessor.sendResponse(DataStreamProcessor.java:440) > > at org.apache.ignite.internal.processors.datastreamer. > DataStreamProcessor.localUpdate(DataStreamProcessor.java:402) > > at org.apache.ignite.internal.processors.datastreamer. > DataStreamProcessor.processRequest(DataStreamProcessor.java:305) > > at org.apache.ignite.internal.processors.datastreamer. > DataStreamProcessor.access$000(DataStreamProcessor.java:60) > > at org.apache.ignite.internal.processors.datastreamer. > DataStreamProcessor$1.onMessage(DataStreamProcessor.java:90) > > at org.apache.ignite.internal.managers.communication. > GridIoManager.invokeListener(GridIoManager.java:1556) > > at org.apache.ignite.internal.managers.communication. > GridIoManager.processRegularMessage0(GridIoManager.java:1184) > > at org.apache.ignite.internal.managers.communication. > GridIoManager.access$4200(GridIoManager.java:125) > > at org.apache.ignite.internal.managers.communication. > GridIoManager$9.run(GridIoManager.java:1091) > > at org.apache.ignite.internal.util.StripedExecutor$Stripe. > run(StripedExecutor.java:511) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send > message to remote node: ZookeeperClusterNode > [id=78af5d88-cbfa-4529-aaee-ff4982985cdf, > addrs=[127.0.0.1], order=377, loc=false, client=true] > > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi. > sendMessage0(TcpCommunicationSpi.java:2718) > > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi. > sendMessage(TcpCommunicationSpi.java:2651) > > at org.apache.ignite.internal.managers.communication. > GridIoManager.send(GridIoManager.java:1643) > > ... 13 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Failed to > connect to node (is node still alive?). Make sure that each ComputeTask and > cache Transaction has a timeout set in order to prevent parties from > waiting forever in case of network issues > [nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf, > addrs=[/127.0.0.1:47101]] > > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi. > createTcpClient(TcpCommunicationSpi.java:3422) > > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi. > createNioClient(TcpCommunicationSpi.java:2958) > > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi. > reserveClient(TcpCommunicationSpi.java:2841) > > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi. > sendMessage0(TcpCommunicationSpi.java:2692) > > ... 15 more > > Suppressed: class org.apache.ignite.IgniteCheckedException: > Failed to connect to address [addr=/127.0.0.1:47101, err=Connection > refused] > > at org.apache.ignite.spi.communication.tcp. > TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425) > > ... 18 more > > Caused by: java.net.ConnectException: Connection refused > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native > Method) > > at sun.nio.ch.SocketChannelImpl.finishConnect( > SocketChannelImpl.java:717) > > at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java: > 111) > > at org.apache.ignite.spi.communication.tcp. > TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3262) > > ... 18 more > > " > > On Tue, Aug 14, 2018 at 4:39 PM, akurbanov <antkr....@gmail.com> wrote: > >> Hi, >> >> Spark integration was implemented before java thin client was released and >> thick client performs better than thin one in general. Is your question >> related to existence of benchmarks for thin vs thick clients in Spark >> integration or just a comparison of these two options? >> >> Thin clients' functionality is limited compared to thick client, also it >> generally should be a bit slower as it is communicates not with whole >> cluster, but only with a single node and is not partition-aware. This >> introduces additional network costs which may affect performance compared >> to >> thick client in the simplest and ideal conditions where network transfer >> is >> a major part of workload. >> >> However this performance decrease may be completely irrelevant depending >> on >> use case and workload, so you should always measure peformance and do >> benchmarks for a specific use case and make a decision which option suits >> your needs more. >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> > >