...or at least that's that i think is happening :-)

two-node cluster, plus quorum-only node. testing the behavior when active node is gracefully rebooted. all seems well initially. resources are migrated, come up and function as expected.

but, when the rebooted node starts to come back up, the other node seems to lose quorum temporarily, even though it still has communication with the quorum node. this causes the resources to stop until quorum is reestablished.

summary:
active node: xen-nfs01 192.168.250.50
standby node: xen-nfs02 192.168.250.51
quorum node: xen-quorum 192.168.250.52

issue reboot on xen-nfs01
xen-nfs02 becomes active node

xen-nfs01 starts to come back online
xen-nfs02 detects loss of quorum and stops resources
xen-nfs01 finishes booting
quorum is reestablished


instead of overinundating you with all of the debugging output from corosync, pacemaker and corosync-qnetd on all nodes, i'll start with the basics, and provide whatever else is needed on request.

TIA


from the node that was not rebooted:
Apr 5 23:10:15 xen-nfs02 corosync[19099]: [KNET ] udp: Received ICMP error from 192.168.250.51: No route to host Apr 5 23:10:15 xen-nfs02 corosync[19099]: [KNET ] udp: Received ICMP error from 192.168.250.51: No route to host Apr 5 23:10:16 xen-nfs02 corosync[19099]: [KNET ] udp: Received ICMP error from 192.168.250.50: Connection refused Apr 5 23:10:16 xen-nfs02 corosync[19099]: [KNET ] udp: Received ICMP error from 192.168.250.50: Connection refused Apr 5 23:10:16 xen-nfs02 corosync[19099]: [KNET ] rx: host: 1 link: 0 received pong: 1
Apr  5 23:10:17 xen-nfs02 corosync-qdevice[19108]: Received vote info
Apr  5 23:10:17 xen-nfs02 corosync-qdevice[19108]:   seq = 6
Apr  5 23:10:17 xen-nfs02 corosync-qdevice[19108]:   vote = NACK
Apr  5 23:10:17 xen-nfs02 corosync-qdevice[19108]:   ring id = (2.814)
Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: Algorithm result vote is NACK Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: Cast vote timer remains scheduled every 500ms voting NACK. Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: Yes QdeviceAlive: Yes QdeviceCastVote: No QdeviceMasterWins: No Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] got nodeinfo message from cluster node 2 Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 49 Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: Yes QdeviceAlive: Yes QdeviceCastVote: No QdeviceMasterWins: No Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] total_votes=2, expected_votes=3 Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] node 1 state=2, votes=1, expected=3 Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] node 2 state=1, votes=1, expected=3 Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] quorum lost, blocking activity Apr 05 23:10:17 [19099] xen-nfs02 corosync notice [QUORUM] This node is within the non-primary component and will NOT provide any services.
Apr 05 23:10:17 [19099] xen-nfs02 corosync notice  [QUORUM] Members[1]: 2
Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [QUORUM] sending quorum notification to (nil), length = 52 Apr 05 23:10:17 [19099] xen-nfs02 corosync debug [VOTEQ ] Sending quorum callback, quorate = 0
...
Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: Votequorum quorum notify callback:
Apr  5 23:10:17 xen-nfs02 corosync-qdevice[19108]:   Quorate = 0
Apr  5 23:10:17 xen-nfs02 corosync-qdevice[19108]:   Node list (size = 3):
Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: 0 nodeid = 1, state = 2 Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: 1 nodeid = 2, state = 1 Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: 2 nodeid = 0, state = 0 Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: Algorithm decided to send list and result vote is No change Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: Sending quorum node list seq = 13, quorate = 0
Apr  5 23:10:17 xen-nfs02 corosync-qdevice[19108]:   Node list:
Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: 0 node_id = 1, data_center_id = 0, node_state = dead Apr 5 23:10:17 xen-nfs02 corosync-qdevice[19108]: 1 node_id = 2, data_center_id = 0, node_state = member



from the quorum node:
Apr 05 23:10:17 debug   New client connected
Apr 05 23:10:17 debug     cluster name = xen-nfs01_xen-nfs02
Apr 05 23:10:17 debug     tls started = 1
Apr 05 23:10:17 debug     tls peer certificate verified = 1
Apr 05 23:10:17 debug     node_id = 1
Apr 05 23:10:17 debug     pointer = 0x55b37c2d74f0
Apr 05 23:10:17 debug     addr_str = ::ffff:192.168.250.50:54462
Apr 05 23:10:17 debug     ring id = (1.814)
Apr 05 23:10:17 debug     cluster dump:
Apr 05 23:10:17 debug client = ::ffff:192.168.250.51:54876, node_id = 2 Apr 05 23:10:17 debug client = ::ffff:192.168.250.50:54462, node_id = 1 Apr 05 23:10:17 debug Client ::ffff:192.168.250.50:54462 (cluster xen-nfs01_xen-nfs02, node_id 1) sent initial node list.
Apr 05 23:10:17 debug     msg seq num = 4
Apr 05 23:10:17 debug     node list:
Apr 05 23:10:17 debug node_id = 1, data_center_id = 0, node_state = not set Apr 05 23:10:17 debug node_id = 2, data_center_id = 0, node_state = not set
Apr 05 23:10:17 debug   Algorithm result vote is Ask later
Apr 05 23:10:17 debug Client ::ffff:192.168.250.50:54462 (cluster xen-nfs01_xen-nfs02, node_id 1) sent membership node list.
Apr 05 23:10:17 debug     msg seq num = 5
Apr 05 23:10:17 debug     ring id = (1.814)
Apr 05 23:10:17 debug     heuristics = Undefined
Apr 05 23:10:17 debug     node list:
Apr 05 23:10:17 debug node_id = 1, data_center_id = 0, node_state = not set Apr 05 23:10:17 debug ffsplit: Membership for cluster xen-nfs01_xen-nfs02 is now stable
Apr 05 23:10:17 debug   ffsplit: Quorate partition selected
Apr 05 23:10:17 debug     node list:
Apr 05 23:10:17 debug node_id = 1, data_center_id = 0, node_state = not set Apr 05 23:10:17 debug Sending vote info to client ::ffff:192.168.250.51:54876 (cluster xen-nfs01_xen-nfs02, node_id 2)
Apr 05 23:10:17 debug     msg seq num = 6
Apr 05 23:10:17 debug     vote = NACK
Apr 05 23:10:17 debug   Algorithm result vote is No change
Apr 05 23:10:17 debug Client ::ffff:192.168.250.50:54462 (cluster xen-nfs01_xen-nfs02, node_id 1) sent quorum node list.
Apr 05 23:10:17 debug     msg seq num = 6
Apr 05 23:10:17 debug     quorate = 0
Apr 05 23:10:17 debug     node list:
Apr 05 23:10:17 debug node_id = 1, data_center_id = 0, node_state = member
Apr 05 23:10:17 debug   Algorithm result vote is No change
Apr 05 23:10:17 debug Client ::ffff:192.168.250.51:54876 (cluster xen-nfs01_xen-nfs02, node_id 2) replied back to vote info message
Apr 05 23:10:17 debug     msg seq num = 6
Apr 05 23:10:17 debug ffsplit: All NACK votes sent for cluster xen-nfs01_xen-nfs02 Apr 05 23:10:17 debug Sending vote info to client ::ffff:192.168.250.50:54462 (cluster xen-nfs01_xen-nfs02, node_id 1)
Apr 05 23:10:17 debug     msg seq num = 1
Apr 05 23:10:17 debug     vote = ACK
Apr 05 23:10:17 debug Client ::ffff:192.168.250.50:54462 (cluster xen-nfs01_xen-nfs02, node_id 1) replied back to vote info message
Apr 05 23:10:17 debug     msg seq num = 1
Apr 05 23:10:17 debug ffsplit: All ACK votes sent for cluster xen-nfs01_xen-nfs02 Apr 05 23:10:17 debug Client ::ffff:192.168.250.51:54876 (cluster xen-nfs01_xen-nfs02, node_id 2) sent quorum node list.
Apr 05 23:10:17 debug     msg seq num = 13
Apr 05 23:10:17 debug     quorate = 0
Apr 05 23:10:17 debug     node list:
Apr 05 23:10:17 debug node_id = 1, data_center_id = 0, node_state = dead Apr 05 23:10:17 debug node_id = 2, data_center_id = 0, node_state = member
Apr 05 23:10:17 debug   Algorithm result vote is No change

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to