Hi! The issue is that Control Center Agent puts its configuration to the meta-storage. Ignite has an issue with processing data in meta-storage with class that is not present on all nodes: https://issues.apache.org/jira/browse/IGNITE-13642 Effectively it means that you can't remove control-center-agent from a cluster that worked with it previously.
You have a few options how to solve it: - Add control-center-agent to class path of all nodes and disable it using management.sh --off. Classes and configuration will be there, but it won't do anything. You'll be able to remove the library after an upgrade to the version that doesn't have this bug. Hopefully, it will be fixed in Ignite 2.9.1 - Remove the metastorage directory from the persistence directory on all nodes. It will lead to removal of Control Center Agent configuration along with Baseline Topology history. You will need to do that together with removal of the control-center-agent library. NOTE that removal of metastorage is a dangerous operation and can lead to data loss. I recommend using the first option if it works for you. Make a copy of persistence directories before removing anything. After the removal and a restart the baseline topology will be reset. Make sure that first activation will lead to the same BLT like before the restart to avoid data loss. Also note that Control Center doesn't support Ignite 2.9 yet. The agent for it is on its way. Currently only Ignite 2.8 is supported. Denis On 28.10.2020, 19:58, "Bastien Durel" <[email protected]> wrote: Hello, I'm running a 2.9.0 cluster with 2 nodes. I tried to use grid grain's ControlCenterAgent to investigate a slowdown. When I removed the agent files from server (I don't like to have to put it in all clients), the second node cannot join the cluster when I start it. If I start node A, then node B, node B fails, but if I start node B, then node A, node A fails. If I put the agent files back, then all nodes can start, but clients fail because they don't have the agent classes themselves. When a node fails to start, it prints this log : [17:52:45,265][INFO][tcp-disco-sock-reader-[2f3f6f3a 192.168.43.29:39675]-#6%ClusterWA%-#50%ClusterWA%][TcpDiscoverySpi] Initialized connection with remote server node [nodeId=2f3f6f3a-accb-4708-a5cc-26d324a07816, rmtAddr=/192.168.43.29:39675] [17:52:45,268][SEVERE][main][IgniteKernal%ClusterWA] Failed to start manager: GridManagerAdapter [enabled=true, name=o.a.i.i.managers.discovery.GridDiscoveryManager] class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@39a8e2fa], reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:967) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1935) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1298) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2046) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1698) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1114) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1032) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:918) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:817) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:687) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:656) at org.apache.ignite.Ignition.start(Ignition.java:353) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300) Caused by: class org.apache.ignite.spi.IgniteSpiException: Unable to unmarshal key=metastorage.cluster.id.tag at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2018) at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1189) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:462) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2120) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299) ... 13 more [17:52:45,271][SEVERE][main][IgniteKernal%ClusterWA] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to start manager: GridManagerAdapter [enabled=true, name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1940) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1298) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2046) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1698) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1114) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1032) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:918) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:817) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:687) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:656) at org.apache.ignite.Ignition.start(Ignition.java:353) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@39a8e2fa], reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:967) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1935) ... 11 more Caused by: class org.apache.ignite.spi.IgniteSpiException: Unable to unmarshal key=metastorage.cluster.id.tag at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2018) at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1189) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:462) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2120) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299) ... 13 more [17:52:45,271][INFO][tcp-disco-sock-reader-[2f3f6f3a 192.168.43.29:39675]-#6%ClusterWA%-#50%ClusterWA%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.43.29:39675, rmtPort=39675 And the running node has this : [17:52:45,223][INFO][tcp-disco-sock-reader-[9a3233c6 192.168.43.30:54951]-#4%ClusterWA%-#55%ClusterWA%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.43.30:54951, rmtPort=54951 [17:52:45,246][INFO][tcp-disco-msg-worker-[crd]-#2%ClusterWA%-#46%ClusterWA%][GridEncryptionManager] Joining node doesn't have stored group keys [node=9a3233c6-3a6c-4be0-b5e7-19cdff30f69e] [17:52:45,266][WARNING][disco-pool-#56%ClusterWA%][TcpDiscoverySpi] Unable to unmarshal key=metastorage.cluster.id.tag If I start the nodes in the reverse order, it has this : [17:56:52,426][INFO][tcp-disco-sock-reader-[4b8b92f5 192.168.43.29:42557]-#4%ClusterWA%-#53%ClusterWA%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.43.29:42557, rmtPort=42557 [17:56:52,446][INFO][tcp-disco-msg-worker-[crd]-#2%ClusterWA%-#46%ClusterWA%][GridEncryptionManager] Joining node doesn't have stored group keys [node=4b8b92f5-1753-4b1b-9902-476c925fa49d] [17:56:52,466][WARNING][disco-pool-#54%ClusterWA%][TcpDiscoverySpi] Unable to unmarshal key=metastorage.cluster.id.tag Is there a way to recover ? Thanks, -- Bastien Durel DATA Intégration des données de l'entreprise, Systèmes d'information décisionnels. [email protected] tel : +33 (0) 1 57 19 59 28 fax : +33 (0) 1 57 19 59 73 45 avenue Carnot, 94230 CACHAN France www.data.fr
