05.02.2020 20:55, Eric Robinson пишет: > The two servers 001db01a and 001db01b were up and responsive. Neither had > been rebooted and neither were under heavy load. There's no indication in the > logs of loss of network connectivity. Any ideas on why both nodes seem to > think the other one is at fault?
The very fact that nodes lost connection to each other *is* indication of network problems. Your logs start too late, after any problem already happened. > > (Yes, it's a 2-node cluster without quorum. A 3-node cluster is not an option > at this time.) > > Log from 001db01a: > > Feb 5 08:01:02 001db01a corosync[1306]: [TOTEM ] A processor failed, forming > new configuration. > Feb 5 08:01:03 001db01a corosync[1306]: [TOTEM ] A new membership > (10.51.14.33:960) was formed. Members left: 2 > Feb 5 08:01:03 001db01a corosync[1306]: [TOTEM ] Failed to receive the leave > message. failed: 2 > Feb 5 08:01:03 001db01a attrd[1525]: notice: Node 001db01b state is now lost > Feb 5 08:01:03 001db01a attrd[1525]: notice: Removing all 001db01b > attributes for peer loss > Feb 5 08:01:03 001db01a cib[1522]: notice: Node 001db01b state is now lost > Feb 5 08:01:03 001db01a cib[1522]: notice: Purged 1 peer with id=2 and/or > uname=001db01b from the membership cache > Feb 5 08:01:03 001db01a attrd[1525]: notice: Purged 1 peer with id=2 and/or > uname=001db01b from the membership cache > Feb 5 08:01:03 001db01a crmd[1527]: warning: No reason to expect node 2 to > be down > Feb 5 08:01:03 001db01a stonith-ng[1523]: notice: Node 001db01b state is > now lost > Feb 5 08:01:03 001db01a crmd[1527]: notice: Stonith/shutdown of 001db01b > not matched > Feb 5 08:01:03 001db01a corosync[1306]: [QUORUM] Members[1]: 1 > Feb 5 08:01:03 001db01a corosync[1306]: [MAIN ] Completed service > synchronization, ready to provide service. > Feb 5 08:01:03 001db01a stonith-ng[1523]: notice: Purged 1 peer with id=2 > and/or uname=001db01b from the membership cache > Feb 5 08:01:03 001db01a pacemakerd[1491]: notice: Node 001db01b state is > now lost > Feb 5 08:01:03 001db01a crmd[1527]: notice: State transition S_IDLE -> > S_POLICY_ENGINE > Feb 5 08:01:03 001db01a crmd[1527]: notice: Node 001db01b state is now lost > Feb 5 08:01:03 001db01a crmd[1527]: warning: No reason to expect node 2 to > be down > Feb 5 08:01:03 001db01a crmd[1527]: notice: Stonith/shutdown of 001db01b > not matched > Feb 5 08:01:03 001db01a pengine[1526]: notice: On loss of CCM Quorum: Ignore > > From 001db01b: > > Feb 5 08:01:03 001db01b corosync[1455]: [TOTEM ] A new membership > (10.51.14.34:960) was formed. Members left: 1 > Feb 5 08:01:03 001db01b crmd[1693]: notice: Our peer on the DC (001db01a) > is dead > Feb 5 08:01:03 001db01b stonith-ng[1689]: notice: Node 001db01a state is > now lost > Feb 5 08:01:03 001db01b corosync[1455]: [TOTEM ] Failed to receive the leave > message. failed: 1 > Feb 5 08:01:03 001db01b corosync[1455]: [QUORUM] Members[1]: 2 > Feb 5 08:01:03 001db01b corosync[1455]: [MAIN ] Completed service > synchronization, ready to provide service. > Feb 5 08:01:03 001db01b stonith-ng[1689]: notice: Purged 1 peer with id=1 > and/or uname=001db01a from the membership cache > Feb 5 08:01:03 001db01b pacemakerd[1678]: notice: Node 001db01a state is > now lost > Feb 5 08:01:03 001db01b crmd[1693]: notice: State transition S_NOT_DC -> > S_ELECTION > Feb 5 08:01:03 001db01b crmd[1693]: notice: Node 001db01a state is now lost > Feb 5 08:01:03 001db01b attrd[1691]: notice: Node 001db01a state is now lost > Feb 5 08:01:03 001db01b attrd[1691]: notice: Removing all 001db01a > attributes for peer loss > Feb 5 08:01:03 001db01b attrd[1691]: notice: Lost attribute writer 001db01a > Feb 5 08:01:03 001db01b attrd[1691]: notice: Purged 1 peer with id=1 and/or > uname=001db01a from the membership cache > Feb 5 08:01:03 001db01b crmd[1693]: notice: State transition S_ELECTION -> > S_INTEGRATION > Feb 5 08:01:03 001db01b cib[1688]: notice: Node 001db01a state is now lost > Feb 5 08:01:03 001db01b cib[1688]: notice: Purged 1 peer with id=1 and/or > uname=001db01a from the membership cache > Feb 5 08:01:03 001db01b stonith-ng[1689]: notice: [cib_diff_notify] Patch > aborted: Application of an update diff failed (-206) > Feb 5 08:01:03 001db01b crmd[1693]: warning: Input I_ELECTION_DC received in > state S_INTEGRATION from do_election_check > Feb 5 08:01:03 001db01b pengine[1692]: notice: On loss of CCM Quorum: Ignore > > > -Eric > > > > Disclaimer : This email and any files transmitted with it are confidential > and intended solely for intended recipients. If you are not the named > addressee you should not disseminate, distribute, copy or alter this email. > Any views or opinions presented in this email are solely those of the author > and might not represent those of Physician Select Management. Warning: > Although Physician Select Management has taken reasonable precautions to > ensure no viruses are present in this email, the company cannot accept > responsibility for any loss or damage arising from the use of this email or > attachments. > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
