Re: removing ControlCenterAgent

Mekhanikov Denis Thu, 29 Oct 2020 05:09:03 -0700

Hi!

The issue is that Control Center Agent puts its configuration to the 
meta-storage. 
Ignite has an issue with processing data in meta-storage with class that is not 
present on all nodes: https://issues.apache.org/jira/browse/IGNITE-13642
Effectively it means that you can't remove control-center-agent from a cluster 
that worked with it previously.


You have a few options how to solve it:
- Add control-center-agent to class path of all nodes and disable it using 
management.sh --off. Classes and configuration will be there, but it won't do 
anything. You'll be able to remove the library after an upgrade to the version 
that doesn't have this bug. Hopefully, it will be fixed in Ignite 2.9.1

- Remove the metastorage directory from the persistence directory on all nodes. 
It will lead to removal of Control Center Agent configuration along with 
Baseline Topology history.
You will need to do that together with removal of the control-center-agent 
library.
NOTE that removal of metastorage is a dangerous operation and can lead to data 
loss. I recommend using the first option if it works for you.
Make a copy of persistence directories before removing anything. After the 
removal and a restart the baseline topology will be reset. Make sure that first 
activation will lead to the same BLT like before the restart to avoid data loss.

Also note that Control Center doesn't support Ignite 2.9 yet. The agent for it 
is on its way. Currently only Ignite 2.8 is supported.

Denis

On 28.10.2020, 19:58, "Bastien Durel" <[email protected]> wrote:

    Hello,

    I'm running a 2.9.0 cluster with 2 nodes. I tried to use grid grain's
    ControlCenterAgent to investigate a slowdown.

    When I removed the agent files from server (I don't like to have to put
    it in all clients), the second node cannot join the cluster when I
    start it.

    If I start node A, then node B, node B fails, but if I start node B,
    then node A, node A fails.

    If I put the agent files back, then all nodes can start, but clients
    fail because they don't have the agent classes themselves.

    When a node fails to start, it prints this log :


    [17:52:45,265][INFO][tcp-disco-sock-reader-[2f3f6f3a 
192.168.43.29:39675]-#6%ClusterWA%-#50%ClusterWA%][TcpDiscoverySpi] Initialized 
connection with remote server node 
[nodeId=2f3f6f3a-accb-4708-a5cc-26d324a07816, rmtAddr=/192.168.43.29:39675]
    [17:52:45,268][SEVERE][main][IgniteKernal%ClusterWA] Failed to start 
manager: GridManagerAdapter [enabled=true, 
name=o.a.i.i.managers.discovery.GridDiscoveryManager]
    class org.apache.ignite.IgniteCheckedException: Failed to start SPI: 
TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, 
marsh=JdkMarshaller 
[clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@39a8e2fa], 
reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, 
forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, 
skipAddrsRandomization=false]
        at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:967)
        at 
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1935)
        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1298)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2046)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1698)
        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1114)
        at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1032)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:918)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:817)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:687)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:656)
        at org.apache.ignite.Ignition.start(Ignition.java:353)
        at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300)
    Caused by: class org.apache.ignite.spi.IgniteSpiException: Unable to 
unmarshal key=metastorage.cluster.id.tag
        at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2018)
        at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1189)
        at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:462)
        at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2120)
        at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299)
        ... 13 more
    [17:52:45,271][SEVERE][main][IgniteKernal%ClusterWA] Got exception while 
starting (will rollback startup routine).
    class org.apache.ignite.IgniteCheckedException: Failed to start manager: 
GridManagerAdapter [enabled=true, 
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at 
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1940)
        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1298)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2046)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1698)
        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1114)
        at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1032)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:918)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:817)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:687)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:656)
        at org.apache.ignite.Ignition.start(Ignition.java:353)
        at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300)
    Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start 
SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, 
marsh=JdkMarshaller 
[clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@39a8e2fa], 
reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, 
forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, 
skipAddrsRandomization=false]
        at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:967)
        at 
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1935)
        ... 11 more
    Caused by: class org.apache.ignite.spi.IgniteSpiException: Unable to 
unmarshal key=metastorage.cluster.id.tag
        at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2018)
        at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1189)
        at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:462)
        at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2120)
        at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299)
        ... 13 more
    [17:52:45,271][INFO][tcp-disco-sock-reader-[2f3f6f3a 
192.168.43.29:39675]-#6%ClusterWA%-#50%ClusterWA%][TcpDiscoverySpi] Finished 
serving remote node connection [rmtAddr=/192.168.43.29:39675, rmtPort=39675

    And the running node has this :

    [17:52:45,223][INFO][tcp-disco-sock-reader-[9a3233c6 
192.168.43.30:54951]-#4%ClusterWA%-#55%ClusterWA%][TcpDiscoverySpi] Finished 
serving remote node connection [rmtAddr=/192.168.43.30:54951, rmtPort=54951
    
[17:52:45,246][INFO][tcp-disco-msg-worker-[crd]-#2%ClusterWA%-#46%ClusterWA%][GridEncryptionManager]
 Joining node doesn't have stored group keys 
[node=9a3233c6-3a6c-4be0-b5e7-19cdff30f69e]
    [17:52:45,266][WARNING][disco-pool-#56%ClusterWA%][TcpDiscoverySpi] Unable 
to unmarshal key=metastorage.cluster.id.tag

    If I start the nodes in the reverse order, it has this :

    [17:56:52,426][INFO][tcp-disco-sock-reader-[4b8b92f5 
192.168.43.29:42557]-#4%ClusterWA%-#53%ClusterWA%][TcpDiscoverySpi] Finished 
serving remote node connection [rmtAddr=/192.168.43.29:42557, rmtPort=42557
    
[17:56:52,446][INFO][tcp-disco-msg-worker-[crd]-#2%ClusterWA%-#46%ClusterWA%][GridEncryptionManager]
 Joining node doesn't have stored group keys 
[node=4b8b92f5-1753-4b1b-9902-476c925fa49d]
    [17:56:52,466][WARNING][disco-pool-#54%ClusterWA%][TcpDiscoverySpi] Unable 
to unmarshal key=metastorage.cluster.id.tag

    Is there a way to recover ?

    Thanks,

    -- 
    Bastien Durel
    DATA
    Intégration des données de l'entreprise,
    Systèmes d'information décisionnels.

    [email protected]
    tel : +33 (0) 1 57 19 59 28
    fax : +33 (0) 1 57 19 59 73
    45 avenue Carnot, 94230 CACHAN France
    www.data.fr

Re: removing ControlCenterAgent

Reply via email to