Hi Ping! Just in case the question is still relevant, you can join tomorrow's Q&A session[1] to reach Ignite developers with this question.
Cheers, Kseniya [1] https://www.meetup.com/Apache-Ignite-Virtual-Meetup/events/273921637/ вт, 4 авг. 2020 г. в 01:26, pinghao99 <[email protected]>: > Hi All, > > I setup a 2.7.5 version 6 server nodes cluster. cache A created with > partition mode, backup = 1 cache use durable region. all nodes started, > baseline number is 6. > > A ignite client started with baseline monitoring code copy from > > https://apacheignite.readme.io/v2.7.5/docs/baseline-topology#triggering-rebalancing-programmatically > > the client run a forever loop, it simply do single cache put of cache A > every second. > > Then manually stop nodes one by one, at least few seconds between each > stopping, all cache put were fine since cluster went through re-balancing > when node left. > > Then gradually bring back ignite nodes, some of nodes rejoin cluster > without > error, however, it will always have node failed to join the cluster, with > exceptions : > > [15:13:08] Security status [authentication=off, tls/ssl=off] > [15:13:09] Ignite node stopped in the middle of checkpoint. Will restore > memory state and finish checkpoint on node start. > [15:13:09,487][SEVERE][main][IgniteKernal] Exception during start > processors, node will be stopped and close connections > class org.apache.ignite.IgniteCheckedException: Restoring of > BaselineTopology history has failed, expected history item not found for > id=0 > at > > org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) > at > > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:223) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:397) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:663) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4611) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1048) > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038) > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1730) > at > org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158) > at > > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1076) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:962) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:861) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:731) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:700) > at org.apache.ignite.Ignition.start(Ignition.java:348) > at > > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > [15:13:09,489][SEVERE][main][IgniteKernal] Got exception while starting > (will rollback startup routine). > class org.apache.ignite.IgniteCheckedException: Restoring of > BaselineTopology history has failed, expected history item not found for > id=0 > at > > org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) > at > > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:223) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:397) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:663) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4611) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1048) > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038) > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1730) > at > org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158) > at > > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1076) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:962) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:861) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:731) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:700) > at org.apache.ignite.Ignition.start(Ignition.java:348) > at > > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > [15:13:09] Ignite node stopped OK [uptime=00:00:01.800] > class org.apache.ignite.IgniteException: Restoring of BaselineTopology > history has failed, expected history item not found for id=0 > at > > org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1026) > at org.apache.ignite.Ignition.start(Ignition.java:351) > at > > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of > BaselineTopology history has failed, expected history item not found for > id=0 > at > > org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) > at > > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:223) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:397) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:663) > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4611) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1048) > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038) > at > > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1730) > at > org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158) > at > > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1076) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:962) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:861) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:731) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:700) > at org.apache.ignite.Ignition.start(Ignition.java:348) > ... 1 more > Failed to start grid: Restoring of BaselineTopology history has failed, > expected history item not found for id=0 > > ================ > Workaround is wipe out ignite data directory on the failed node, it can > rejoin then without issue. > > This is pretty reproducible, and look like an ignite bug. A rejoined ignite > node, even it hold outdated data, is not suppose to cause exception, the > outdated data can be safely ignored, and let it rejoin the cluster with > clean slate. > > This issue make our production deployment can not recover from sporadic > node > left / rejoin case. > > Is this same as unsolved issue > https://issues.apache.org/jira/browse/IGNITE-12850? I don't know what's > metastorage means in the ticket. > > Any suggestion? > > Thanks & Regards > Ping > > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
