Hi folks, We have been facing an instability using the Ignite cluster over the Redis interface across multiple release versions. I have already filed a Jira ticket (https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-23551). But, I was not able to find anyone reporting a similar issue. So, I was wondering if anyone is using Ignite in this configuration and whether they have dealt with this problem.
The simple gist of the problem is as follows (copied from the Jira ticket): We are using Ignite as a persistent caching system primarily to write KVs using the redis interface; we define redis caches statically in the xml config. We have been plagued by an issue where restarting a node in an existing stable cluster does not work and the node fails every time trying to join the cluster giving a NullPointerException. This happens with any and every node in the cluster and persists no matter how many times the node is started up. After a full cluster restart the issue goes away. Following is the stacktrace we see in the logs of the failed node: [10:51:34,769][SEVERE][tcp-disco-msg-worker-[fa915882 10.132.0.114:47500]-#2-#57][TcpDiscoverySpi] TcpDiscoverSpi's message worker thread failed abnormally. S topping the node in order to prevent cluster wide instability. java.lang.NullPointerException: Cannot invoke "org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$CachePredicate.addClientNode(java.util.UUID, boolean)" because "p" is null at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.addClientNode(GridDiscoveryManager.java:428) at org.apache.ignite.internal.processors.cache.ClusterCachesInfo.addReceivedClientNodesToDiscovery(ClusterCachesInfo.java:1600) at org.apache.ignite.internal.processors.cache.ClusterCachesInfo.onGridDataReceived(ClusterCachesInfo.java:1519) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onGridDataReceived(GridCacheProcessor.java:3137) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onExchange(GridDiscoveryManager.java:1019) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.onExchange(TcpDiscoverySpi.java:2197) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:5359) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3242) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8048) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7979) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) Appreciate any pointers in advance. Thanks and Regards, Ashu Pachauri