Hi, Please find the attached log for a complete but failed reboot. You can see the exceptions.
On Mon, Nov 16, 2020 at 4:00 AM Ivan Bessonov <[email protected]> wrote: > Hello, > > there must be a bug somewhere during node start, it updates its > distributed metastorage content and tries to join an already activated > cluster, thus creating a conflict. It's hard to tell the exact data that > caused conflict, especially without any logs. > > Topic that you mentioned ( > http://apache-ignite-users.70518.x6.nabble.com/Question-about-baseline-topology-and-cluster-activation-td34336.html) > seems to be about the same problem, but the issue > https://issues.apache.org/jira/browse/IGNITE-12850 is not related to it. > > If you have logs from those unsuccessful restart attempts, it would be > very helpful. > > Sadly, distributed metastorage is an internal component to store settings > and has no public documentation. Developers documentation is probably > outdated and incomplete. But just in case, "version id" that message is > referring to is located in field > "org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl#ver", > it's incremented on every distributed metastorage setting update. You can > find your error message in the same class. > > Please follow up with more questions and logs it possible, I hope we'll > figure it out. > > Thank you! > > пт, 13 нояб. 2020 г. в 02:23, Cong Guo <[email protected]>: > >> Hi, >> >> I have a 3-node cluster with persistence enabled. All the three nodes are >> in the baseline topology. The ignite version is 2.8.1. >> >> When I restart the first node, it encounters an error and fails to join >> the cluster. The error message is "Caused by: org.apache. >> ignite.spi.IgniteSpiException: Attempting to join node with larger >> distributed metastorage version id. The node is most likely in invalid >> state and can't be joined." I try several times but get the same error. >> >> Then I restart the second node, it encounters the same error. After I >> restart the third node, the other two nodes can start successfully and join >> the cluster. When I restart the nodes, I do not change the baseline >> topology. I cannot reproduce this error now. >> >> I find someone else has the same problem. >> http://apache-ignite-users.70518.x6.nabble.com/Question-about-baseline-topology-and-cluster-activation-td34336.html >> >> The answer is corruption in the metastorage. I do not see any issue of >> the metastorage files. However, it is a small probability event to have >> files on two different machines corrupted at the same time. Is it possible >> that this is another bug like >> https://issues.apache.org/jira/browse/IGNITE-12850? >> >> Do you have any document about how the version id is updated and read? >> Could you please show me in the source code where the version id is read >> when a node starts and where the version id is updated when a node stops? >> Thank you! >> >> >> > > -- > Sincerely yours, > Ivan Bessonov >
errorlog
Description: Binary data
