I setup one ignite server node, and registered one service instance. And
then, I call the service function from 1000 tasks(threads) distributed on
100 different machines (spark app).
The problem is that tcp connections between ignite server and clients are
unstable. At the early stage of this job, a lot of exceptions related to
TcpCommnicationSpi occur. The exception stack is as follows.
[16:51:10,047][INFO][grid-nio-worker-tcp-comm-12-#101][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/127.0.0.1:55555,
rmtAddr=/127.0.0.1:43416]
[16:51:10,248][INFO][grid-nio-worker-tcp-comm-13-#102][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/127.0.0.1:55555,
rmtAddr=/127.0.0.1:43418]
[16:51:10,448][INFO][grid-nio-worker-tcp-comm-14-#103][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/127.0.0.1:55555,
rmtAddr=/127.0.0.1:43420]
[16:51:10,648][INFO][grid-nio-worker-tcp-comm-15-#104][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/127.0.0.1:55555,
rmtAddr=/127.0.0.1:43422]
[16:51:10,648][SEVERE][srvc-deploy-#152][query] Failed to send event
notification to node: af5d28df-40c9-4cc0-9c46-5b01e1a0590e
class org.apache.ignite.IgniteCheckedException: Failed to send message (node
may have left the grid or TCP connection cannot be established due to
firewall issues) [node=TcpDiscoveryNode
[id=af5d28df-40c9-4cc0-9c46-5b01e1a0590e, addrs=[10.112.96.139, 127.0.0.1],
sockAddr
s=[/127.0.0.1:0, cc3a7x2183.nhnsystem.com/10.112.96.139:0], discPort=0,
order=158, intOrder=132, lastExchangeTime=1512373825597, loc=false,
ver=2.3.0#20171028-sha1:8add7fd5, isClient=true], topic=T4
[topic=TOPIC_CACHE, id1=5dc785b4-dd3d-3c3b-b270-c5fe2d7ed9a2, id2=af5d28
df-40c9-4cc0-9c46-5b01e1a0590e, id3=1], msg=GridContinuousMessage
[type=MSG_EVT_NOTIFICATION, routineId=598deff2-4a93-4792-987a-0c913d5007c5,
data=null, futId=null], policy=2]
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1650)
at
org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1862)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1359)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1330)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1312)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:972)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:912)
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:855)
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:82)
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:413)
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:373)
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1025)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:667)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.localFinish(GridNearTxLocal.java:3078)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:418)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$16.apply(GridNearTxLocal.java:3229)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$16.apply(GridNearTxLocal.java:3223)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:3223)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commit(GridNearTxLocal.java:3188)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.reassign(GridServiceProcessor.java:1234)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.access$2400(GridServiceProcessor.java:122)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor$TopologyListener$1.run0(GridServiceProcessor.java:1745)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor$DepRunnable.run(GridServiceProcessor.java:1958)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=af5d28df-40c9-4cc0-9c46-5b01e1a0590e, addrs=[10.112.96.139, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, cc3a7x2183.nhnsystem.com/10.112.96.139:0],
discPort=0,
order=158, intOrder=132, lastExchangeTime=1512373825597, loc=false,
ver=2.3.0#20171028-sha1:8add7fd5, isClient=true]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2615)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2551)
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
... 27 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect
to node (is node still alive?). Make sure that each ComputeTask and cache
Transaction has a timeout set in order to prevent parties from waiting
forever in case of network issues [nodeId=af5d28d
f-40c9-4cc0-9c46-5b01e1a0590e,
addrs=[cc3a7x2183.nhnsystem.com/10.112.96.139:55555, /127.0.0.1:55555]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3288)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2839)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2726)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2587)
... 29 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address [addr=cc3a7x2183.nhnsystem.com/10.112.96.139:55555,
err=Connection refused]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 32 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3121)
... 32 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address [addr=/127.0.0.1:55555, err=Remote node ID is not as
expected [expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 32 more
Caused by: class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException:
Remote node ID is not as expected
[expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3442)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3135)
... 32 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address [addr=/127.0.0.1:55555, err=Remote node ID is not as
expected [expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 32 more
Caused by: class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException:
Remote node ID is not as expected
[expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3442)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3135)
... 32 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address [addr=/127.0.0.1:55555, err=Remote node ID is not as
expected [expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 32 more
Caused by: class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException:
Remote node ID is not as expected
[expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3442)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3135)
... 32 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address [addr=/127.0.0.1:55555, err=Remote node ID is not as
expected [expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 32 more
Caused by: class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException:
Remote node ID is not as expected
[expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3442)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3135)
... 32 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address [addr=/127.0.0.1:55555, err=Remote node ID is not as
expected [expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 32 more
Caused by: class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException:
Remote node ID is not as expected
[expected=af5d28df-40c9-4cc0-9c46-5b01e1a0590e,
rcvd=cb93b676-7866-4d1d-a8f5-b6fea906aeef]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:3442)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3135)
... 32 more
Long exception message, and the last part says that "Remote node ID is not
as expected". This exception message also originates from
"TcpCommunicationSpi.safeHandshake()" call. I don't understand what the
exception message means.
When searching with "Remote node ID is not as expected", I can find a jira
page <https://issues.apache.org/jira/browse/IGNITE-3401> which is similar
to my problem, but nothing to take. Another ignite users topic page
<http://apache-ignite-users.70518.x6.nabble.com/Remote-node-ID-is-not-as-expected-New-Node-not-coming-up-td15858.html>
said that same port on different hosts could be a cause. Really? the same
port number on different hosts cause this problem?
My spark job dose not fail always. 2 of 10 tries success. But it is not
enough to use.
My config is as follows.
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="serviceThreadPoolSize" value="30"/>
<property name="dataStorageConfiguration">
<bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
<property name="name" value="Default_Region"/>
<property name="maxSize" value="#{70L * 1024 * 1024
* 1024}"/>
</bean>
</property>
</bean>
</property>
<property name="discoverySpi">
<bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>10.116.23.223</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
<property name="communicationSpi">
<bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
<property name="localPort" value="55555"/>
<property name="localPortRange" value="10"/>
</bean>
</property>
</bean>
</beans>
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/