Hi All, I setup a 2.7.5 version 6 server nodes cluster. cache A created with partition mode, backup = 1 cache use durable region. all nodes started, baseline number is 6.
A ignite client started with baseline monitoring code copy from https://apacheignite.readme.io/v2.7.5/docs/baseline-topology#triggering-rebalancing-programmatically the client run a forever loop, it simply do single cache put of cache A every second. Then manually stop nodes one by one, at least few seconds between each stopping, all cache put were fine since cluster went through re-balancing when node left. Then gradually bring back ignite nodes, some of nodes rejoin cluster without error, however, it will always have node failed to join the cluster, with exceptions : [15:13:08] Security status [authentication=off, tls/ssl=off] [15:13:09] Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start. [15:13:09,487][SEVERE][main][IgniteKernal] Exception during start processors, node will be stopped and close connections class org.apache.ignite.IgniteCheckedException: Restoring of BaselineTopology history has failed, expected history item not found for id=0 at org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:223) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:397) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:663) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4611) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1048) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1730) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1076) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:962) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:861) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:731) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:700) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) [15:13:09,489][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Restoring of BaselineTopology history has failed, expected history item not found for id=0 at org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:223) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:397) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:663) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4611) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1048) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1730) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1076) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:962) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:861) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:731) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:700) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) [15:13:09] Ignite node stopped OK [uptime=00:00:01.800] class org.apache.ignite.IgniteException: Restoring of BaselineTopology history has failed, expected history item not found for id=0 at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1026) at org.apache.ignite.Ignition.start(Ignition.java:351) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of BaselineTopology history has failed, expected history item not found for id=0 at org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:223) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:397) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:663) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4611) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1048) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1730) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1076) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:962) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:861) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:731) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:700) at org.apache.ignite.Ignition.start(Ignition.java:348) ... 1 more Failed to start grid: Restoring of BaselineTopology history has failed, expected history item not found for id=0 ================ Workaround is wipe out ignite data directory on the failed node, it can rejoin then without issue. This is pretty reproducible, and look like an ignite bug. A rejoined ignite node, even it hold outdated data, is not suppose to cause exception, the outdated data can be safely ignored, and let it rejoin the cluster with clean slate. This issue make our production deployment can not recover from sporadic node left / rejoin case. Is this same as unsolved issue https://issues.apache.org/jira/browse/IGNITE-12850? I don't know what's metastorage means in the ticket. Any suggestion? Thanks & Regards Ping -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
