I am also experiencing this issue. I'm running ignite in a kubernetes cluster and I am trying to do a rolling update. so I have 2 ignite nodes running and I am using K8's rolling update api in a deployment. eg. I am running an application that starts up the 2 nodes. the nodes cluster and I then build my project through a jenkins pipeline and use Helm to upgrade the deployment. k8 takes over and with the deployment brings one node down, puts it back up, waits a minute and then brings the other down and puts it back up.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment as it does this sometimes it works and other times ignite fails to connect to the other node and cluster. k8 brings down a node and tries to put it back up but because it fails K8 stops the rolling update. so we have an old node running and new broken node. [16:38:45] __________ ________________ [16:38:45] / _/ ___/ |/ / _/_ __/ __/ [16:38:45] _/ // (7 7 // / / / / _/ [16:38:45] /___/\___/_/|_/___/ /_/ /___/ [16:38:45] [16:38:45] ver. 2.3.0#20171028-sha1:8add7fd5 [16:38:45] 2017 Copyright(C) Apache Software Foundation [16:38:45] [16:38:45] Ignite documentation: http://ignite.apache.org [16:38:45] [16:38:45] Quiet mode. [16:38:45] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} [16:38:45] [16:38:45] OS: Linux 4.4.0-77-generic amd64 [16:38:45] VM information: Java(TM) SE Runtime Environment 1.8.0_152-b16 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.152-b16 [16:38:45] Configured plugins: [16:38:45] ^-- None [16:38:45] [16:38:46] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides. [16:38:46] Security status [authentication=off, tls/ssl=off] SEVERE: TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent cluster wide instability. java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandlerV2.getEventFilter(CacheContinuousQueryHandlerV2.java:111) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.register(CacheContinuousQueryHandler.java:315) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.registerHandler(GridContinuousProcessor.java:1228) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.onDiscoDataReceived(GridContinuousProcessor.java:523) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.onGridDataReceived(GridContinuousProcessor.java:478) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$5.onExchange(GridDiscoveryManager.java:855) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.onExchange(TcpDiscoverySpi.java:1837) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddedMessage(ServerImpl.java:4328) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2635) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2447) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6648) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2533) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) Jan 30, 2018 4:38:48 PM org.apache.ignite.logger.java.JavaLogger error SEVERE: Runtime error caught during grid runnable execution: IgniteSpiThread [name=tcp-disco-msg-worker-#2] java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandlerV2.getEventFilter(CacheContinuousQueryHandlerV2.java:111) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.register(CacheContinuousQueryHandler.java:315) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.registerHandler(GridContinuousProcessor.java:1228) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.onDiscoDataReceived(GridContinuousProcessor.java:523) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.onGridDataReceived(GridContinuousProcessor.java:478) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$5.onExchange(GridDiscoveryManager.java:855) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.onExchange(TcpDiscoverySpi.java:1837) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddedMessage(ServerImpl.java:4328) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2635) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2447) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6648) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2533) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) Jan 30, 2018 4:38:48 PM org.apache.ignite.logger.java.JavaLogger error SEVERE: Failed to start manager: GridManagerAdapter [enabled=true, name=o.a.i.i.managers.discovery.GridDiscoveryManager] class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [], reconCnt=10, maxAckTimeout=600000, forceSrvMode=false, clientReconnectDisabled=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:882) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1852) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1002) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1909) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1652) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1080) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:998) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:884) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:803) at org.apache.ignite.Ignition.start(Ignition.java:372) at com.mycompay.source.code.IgniteNodeModule.provideIgniteCluster(IgniteNodeModule.java:24) at com.mycompay.source.code.IgniteNodeModule$$FastClassByGuice$$cc13dccd.invoke(<generated>) at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264) at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401) at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:402) at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268) at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085) at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015) at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1054) at com.mycompay.source.code.IgniteNode.main(IgniteNode.java:12) Caused by: class org.apache.ignite.spi.IgniteSpiException: Thread has been interrupted. at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:908) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:360) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1846) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297) ... 39 more Jan 30, 2018 4:38:48 PM org.apache.ignite.logger.java.JavaLogger error SEVERE: Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to start manager: GridManagerAdapter [enabled=true, name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1857) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1002) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1909) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1652) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1080) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:998) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:884) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:803) at org.apache.ignite.Ignition.start(Ignition.java:372) at com.mycompay.source.code.IgniteNodeModule.provideIgniteCluster(IgniteNodeModule.java:24) at com.mycompay.source.code.IgniteNodeModule$$FastClassByGuice$$cc13dccd.invoke(<generated>) at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264) at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401) at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:402) at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268) at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085) at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015) at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1054) at com.mycompay.source.code.IgniteNode.main(IgniteNode.java:12) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [], reconCnt=10, maxAckTimeout=600000, forceSrvMode=false, clientReconnectDisabled=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:882) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1852) ... 37 more Caused by: class org.apache.ignite.spi.IgniteSpiException: Thread has been interrupted. at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:908) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:360) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1846) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297) ... 39 more [16:38:48] (wrn) Ignoring stopping Ignite instance that was already stopped or never started: null [16:38:48] Ignite node stopped OK [uptime=00:00:03.462] log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" com.google.inject.ProvisionException: Unable to provision, see the following errors: 1) Error in custom provider, class org.apache.ignite.IgniteException: Failed to start manager: GridManagerAdapter [enabled=true, name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] at com.mycompay.source.code.IgniteNodeModule.provideIgniteCluster(IgniteNodeModule.java:24) at com.mycompay.source.code.IgniteNodeModule.provideIgniteCluster(IgniteNodeModule.java:24) while locating org.apache.ignite.Ignite for the 1st parameter of com.mycompay.source.code.IgniteNodeModule.provideIgniteEvents(IgniteNodeModule.java:35) at com.mycompay.source.code.IgniteNodeModule.provideIgniteEvents(IgniteNodeModule.java:35) while locating org.apache.ignite.IgniteEvents for the 1st parameter of com.mycompay.source.code.KafkaMessageSenderService.<init>(KafkaMessageSenderService.java:45) while locating com.mycompay.source.code.KafkaMessageSenderService 1 error at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1028) at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1054) at com.mycompay.source.code.IgniteNode.main(IgniteNode.java:12) Caused by: class org.apache.ignite.IgniteException: Failed to start manager: GridManagerAdapter [enabled=true, name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:966) at org.apache.ignite.Ignition.start(Ignition.java:375) at com.mycompay.source.code.IgniteNodeModule.provideIgniteCluster(IgniteNodeModule.java:24) at com.mycompay.source.code.IgniteNodeModule$$FastClassByGuice$$cc13dccd.invoke(<generated>) at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264) at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401) at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:402) at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268) at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085) at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015) ... 2 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start manager: GridManagerAdapter [enabled=true, name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1857) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1002) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1909) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1652) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1080) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:998) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:884) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:803) at org.apache.ignite.Ignition.start(Ignition.java:372) ... 29 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [], reconCnt=10, maxAckTimeout=600000, forceSrvMode=false, clientReconnectDisabled=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:882) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1852) ... 37 more Caused by: class org.apache.ignite.spi.IgniteSpiException: Thread has been interrupted. at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:908) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:360) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1846) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297) -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
