Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

Andrei Borzenkov Wed, 05 Feb 2020 10:14:50 -0800

05.02.2020 20:55, Eric Robinson пишет:
> The two servers 001db01a and 001db01b were up and responsive. Neither had 
> been rebooted and neither were under heavy load. There's no indication in the 
> logs of loss of network connectivity. Any ideas on why both nodes seem to 
> think the other one is at fault?


The very fact that nodes lost connection to each other *is* indication
of network problems. Your logs start too late, after any problem already
happened.

> 
> (Yes, it's a 2-node cluster without quorum. A 3-node cluster is not an option 
> at this time.)
> 
> Log from 001db01a:
> 
> Feb  5 08:01:02 001db01a corosync[1306]: [TOTEM ] A processor failed, forming 
> new configuration.
> Feb  5 08:01:03 001db01a corosync[1306]: [TOTEM ] A new membership 
> (10.51.14.33:960) was formed. Members left: 2
> Feb  5 08:01:03 001db01a corosync[1306]: [TOTEM ] Failed to receive the leave 
> message. failed: 2
> Feb  5 08:01:03 001db01a attrd[1525]:  notice: Node 001db01b state is now lost
> Feb  5 08:01:03 001db01a attrd[1525]:  notice: Removing all 001db01b 
> attributes for peer loss
> Feb  5 08:01:03 001db01a cib[1522]:  notice: Node 001db01b state is now lost
> Feb  5 08:01:03 001db01a cib[1522]:  notice: Purged 1 peer with id=2 and/or 
> uname=001db01b from the membership cache
> Feb  5 08:01:03 001db01a attrd[1525]:  notice: Purged 1 peer with id=2 and/or 
> uname=001db01b from the membership cache
> Feb  5 08:01:03 001db01a crmd[1527]: warning: No reason to expect node 2 to 
> be down
> Feb  5 08:01:03 001db01a stonith-ng[1523]:  notice: Node 001db01b state is 
> now lost
> Feb  5 08:01:03 001db01a crmd[1527]:  notice: Stonith/shutdown of 001db01b 
> not matched
> Feb  5 08:01:03 001db01a corosync[1306]: [QUORUM] Members[1]: 1
> Feb  5 08:01:03 001db01a corosync[1306]: [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Feb  5 08:01:03 001db01a stonith-ng[1523]:  notice: Purged 1 peer with id=2 
> and/or uname=001db01b from the membership cache
> Feb  5 08:01:03 001db01a pacemakerd[1491]:  notice: Node 001db01b state is 
> now lost
> Feb  5 08:01:03 001db01a crmd[1527]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE
> Feb  5 08:01:03 001db01a crmd[1527]:  notice: Node 001db01b state is now lost
> Feb  5 08:01:03 001db01a crmd[1527]: warning: No reason to expect node 2 to 
> be down
> Feb  5 08:01:03 001db01a crmd[1527]:  notice: Stonith/shutdown of 001db01b 
> not matched
> Feb  5 08:01:03 001db01a pengine[1526]:  notice: On loss of CCM Quorum: Ignore
> 
> From 001db01b:
> 
> Feb  5 08:01:03 001db01b corosync[1455]: [TOTEM ] A new membership 
> (10.51.14.34:960) was formed. Members left: 1
> Feb  5 08:01:03 001db01b crmd[1693]:  notice: Our peer on the DC (001db01a) 
> is dead
> Feb  5 08:01:03 001db01b stonith-ng[1689]:  notice: Node 001db01a state is 
> now lost
> Feb  5 08:01:03 001db01b corosync[1455]: [TOTEM ] Failed to receive the leave 
> message. failed: 1
> Feb  5 08:01:03 001db01b corosync[1455]: [QUORUM] Members[1]: 2
> Feb  5 08:01:03 001db01b corosync[1455]: [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Feb  5 08:01:03 001db01b stonith-ng[1689]:  notice: Purged 1 peer with id=1 
> and/or uname=001db01a from the membership cache
> Feb  5 08:01:03 001db01b pacemakerd[1678]:  notice: Node 001db01a state is 
> now lost
> Feb  5 08:01:03 001db01b crmd[1693]:  notice: State transition S_NOT_DC -> 
> S_ELECTION
> Feb  5 08:01:03 001db01b crmd[1693]:  notice: Node 001db01a state is now lost
> Feb  5 08:01:03 001db01b attrd[1691]:  notice: Node 001db01a state is now lost
> Feb  5 08:01:03 001db01b attrd[1691]:  notice: Removing all 001db01a 
> attributes for peer loss
> Feb  5 08:01:03 001db01b attrd[1691]:  notice: Lost attribute writer 001db01a
> Feb  5 08:01:03 001db01b attrd[1691]:  notice: Purged 1 peer with id=1 and/or 
> uname=001db01a from the membership cache
> Feb  5 08:01:03 001db01b crmd[1693]:  notice: State transition S_ELECTION -> 
> S_INTEGRATION
> Feb  5 08:01:03 001db01b cib[1688]:  notice: Node 001db01a state is now lost
> Feb  5 08:01:03 001db01b cib[1688]:  notice: Purged 1 peer with id=1 and/or 
> uname=001db01a from the membership cache
> Feb  5 08:01:03 001db01b stonith-ng[1689]:  notice: [cib_diff_notify] Patch 
> aborted: Application of an update diff failed (-206)
> Feb  5 08:01:03 001db01b crmd[1693]: warning: Input I_ELECTION_DC received in 
> state S_INTEGRATION from do_election_check
> Feb  5 08:01:03 001db01b pengine[1692]:  notice: On loss of CCM Quorum: Ignore
> 
> 
> -Eric
> 
> 
> 
> Disclaimer : This email and any files transmitted with it are confidential 
> and intended solely for intended recipients. If you are not the named 
> addressee you should not disseminate, distribute, copy or alter this email. 
> Any views or opinions presented in this email are solely those of the author 
> and might not represent those of Physician Select Management. Warning: 
> Although Physician Select Management has taken reasonable precautions to 
> ensure no viruses are present in this email, the company cannot accept 
> responsibility for any loss or damage arising from the use of this email or 
> attachments.
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

Reply via email to