Re: [ClusterLabs] [EXT] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

Windl, Ulrich Mon, 06 May 2024 23:36:58 -0700

Hi!

Just some personal comment: If an application isn't cluster-aware (has no 
provisions to run in a HA environment), you may improve its uptime using a 
cluster, but you cannot really make it "HA". Just consider the app needs manual 
intervention after it crashed...


Kind regards,
Ulrich

From: Users <users-boun...@clusterlabs.org> On Behalf Of ale...@pavlyuts.ru
Sent: Thursday, May 2, 2024 1:56 AM
To: users@clusterlabs.org
Subject: [EXT] [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice 
connenction disrupted.

Hi All,

I am trying to build application-specific 2-node failover cluster using ubuntu 
22, pacemaker 2.1.2 + corosync 3.1.6 and DRBD 9.2.9, knet transport.

For some reason I can't use 3-node then I have to use qnetd+qdevice 3.0.1.

The main goal Is to protect custom app which is not cluster-aware by itself. It 
is quite stateful, can't store the state outside memory and take some time to 
get converged with other parts of the system, then the best scenario is 
"failover is a restart with same config", but each unnecessary restart is 
painful. So, if failover done, app must retain on the backup node until it fail 
or admin push it back, this work well with stickiness param.

So, the goal is to detect serving node fail ASAP and restart it ASAP on other 
node, using DRBD-synced config/data. ASAP means within 5-7 sec, not 30 or more.

I was tried different combinations of timing, and finally got acceptable result 
within 5 sec for the best case. But! The case is very unstable.

My setup is a simple: two nodes on VM, and one more VM as arbiter (qnetd), VMs 
under Proxmox and connected by net via external ethernet switch to get closer 
to reality where "nodes VM" should locate as VM on different PHY hosts in one 
rack.

Then, it was adjusted for faster detect and failover.

1.       In Corosync, left the token default 1000ms, but add 
"heartbeat_failures_allowed: 3", this made corosync catch node failure for 
about 200ms (4x50ms heartbeat).

2.       Both qnet and qdevice was run with  net_heartbeat_interval_min=200 to 
allow play with faster hearbeats and detects

3.       Also, quorum.device.net has timeout: 500, sync_timeout: 3000, algo: 
LMS.

The testing is to issue "ate +%M:%S.%N && qm stop 201", and then check the logs 
on timestamp when the app started on the "backup" host. And, when backup host 
boot again, the test is to check the logs for the app was not restarted.

Sometimes switchover work like a charm but sometimes it may delay for dozens of 
seconds.
Sometimes when the primary host boot up again, secondary hold quorum well and 
keep app running, sometimes quorum is lost first (and pacemaker downs the app) 
and then found and pacemaker get app up again, so unwanted restart happen.

My investigation shows that the difference between "good" and "bad" cases:

Good case - all the logs clear and reasonable.

Bad case: qnetd losing connection to second node just after the connection to 
"failure" node detected and then it may take dozens of seconds to restore it. 
All this time qdevice trying to connect qnetd and fails:

Example, host 192.168.100.1 send to failure, 100.2 is failover to:

>From qnetd:
May 01 23:30:39 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.1:60686 
doesn't sent any message during 600ms. Disconnecting
May 01 23:30:39 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.1:60686 
(init_received 1, cluster bsc-test-cluster, node_id 1) disconnect
May 01 23:30:39 arbiter corosync-qnetd[6338]: algo-lms: Client 0x55a6fc6785b0 
(cluster bsc-test-cluster, node_id 1) disconnect
May 01 23:30:39 arbiter corosync-qnetd[6338]: algo-lms:   server going down 0
>>> This is unexpected down, at normal scenario connection persist
May 01 23:30:40 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.2:32790 
doesn't sent any message during 600ms. Disconnecting
May 01 23:30:40 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.2:32790 
(init_received 1, cluster bsc-test-cluster, node_id 2) disconnect
May 01 23:30:40 arbiter corosync-qnetd[6338]: algo-lms: Client 0x55a6fc6363d0 
(cluster bsc-test-cluster, node_id 2) disconnect
May 01 23:30:40 arbiter corosync-qnetd[6338]: algo-lms:   server going down 0
May 01 23:30:56 arbiter corosync-qnetd[6338]: New client connected
May 01 23:30:56 arbiter corosync-qnetd[6338]:   cluster name = bsc-test-cluster
May 01 23:30:56 arbiter corosync-qnetd[6338]:   tls started = 0
May 01 23:30:56 arbiter corosync-qnetd[6338]:   tls peer certificate verified = 0
May 01 23:30:56 arbiter corosync-qnetd[6338]:   node_id = 2
May 01 23:30:56 arbiter corosync-qnetd[6338]:   pointer = 0x55a6fc6363d0
May 01 23:30:56 arbiter corosync-qnetd[6338]:   addr_str = 
::ffff:192.168.100.2:57736
May 01 23:30:56 arbiter corosync-qnetd[6338]:   ring id = (2.801)
May 01 23:30:56 arbiter corosync-qnetd[6338]:   cluster dump:
May 01 23:30:56 arbiter corosync-qnetd[6338]:     client = 
::ffff:192.168.100.2:57736, node_id = 2
May 01 23:30:56 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.2:57736 
(cluster bsc-test-cluster, node_id 2) sent initial node list.
May 01 23:30:56 arbiter corosync-qnetd[6338]:   msg seq num = 99
May 01 23:30:56 arbiter corosync-qnetd[6338]:   Node list:
May 01 23:30:56 arbiter corosync-qnetd[6338]:     0 node_id = 1, data_center_id 
= 0, node_state = not set
May 01 23:30:56 arbiter corosync-qnetd[6338]:     1 node_id = 2, data_center_id 
= 0, node_state = not set
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-lms: cluster 
bsc-test-cluster config_list has 2 nodes
May 01 23:30:56 arbiter corosync-qnetd[6338]: Algorithm result vote is No change
May 01 23:30:56 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.2:57736 
(cluster bsc-test-cluster, node_id 2) sent membership node list.
May 01 23:30:56 arbiter corosync-qnetd[6338]:   msg seq num = 100
May 01 23:30:56 arbiter corosync-qnetd[6338]:   ring id = (2.801)
May 01 23:30:56 arbiter corosync-qnetd[6338]:   heuristics = Undefined
May 01 23:30:56 arbiter corosync-qnetd[6338]:   Node list:
May 01 23:30:56 arbiter corosync-qnetd[6338]:     0 node_id = 2, data_center_id 
= 0, node_state = not set
May 01 23:30:56 arbiter corosync-qnetd[6338]:
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-lms: membership list from 
node 2 partition (2.801)
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-util: partition (2.801) 
(0x55a6fc67e110) has 1 nodes
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-lms: Only 1 partition. This 
is votequorum's problem, not ours
May 01 23:30:56 arbiter corosync-qnetd[6338]: Algorithm result vote is ACK
May 01 23:30:56 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.2:57736 
(cluster bsc-test-cluster, node_id 2) sent quorum node list.
May 01 23:30:56 arbiter corosync-qnetd[6338]:   msg seq num = 101
May 01 23:30:56 arbiter corosync-qnetd[6338]:   quorate = 0
May 01 23:30:56 arbiter corosync-qnetd[6338]:   Node list:
May 01 23:30:56 arbiter corosync-qnetd[6338]:     0 node_id = 1, data_center_id 
= 0, node_state = dead
May 01 23:30:56 arbiter corosync-qnetd[6338]:     1 node_id = 2, data_center_id 
= 0, node_state = member
May 01 23:30:56 arbiter corosync-qnetd[6338]:
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-lms: quorum node list from 
node 2 partition (2.801)
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-util: partition (2.801) 
(0x55a6fc697e70) has 1 nodes
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-lms: Only 1 partition. This 
is votequorum's problem, not ours
May 01 23:30:56 arbiter corosync-qnetd[6338]: Algorithm result vote is ACK
May 01 23:30:56 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.2:57736 
(cluster bsc-test-cluster, node_id 2) sent quorum node list.
May 01 23:30:56 arbiter corosync-qnetd[6338]:   msg seq num = 102
May 01 23:30:56 arbiter corosync-qnetd[6338]:   quorate = 1
May 01 23:30:56 arbiter corosync-qnetd[6338]:   Node list:
May 01 23:30:56 arbiter corosync-qnetd[6338]:     0 node_id = 1, data_center_id 
= 0, node_state = dead
May 01 23:30:56 arbiter corosync-qnetd[6338]:     1 node_id = 2, data_center_id 
= 0, node_state = member
May 01 23:30:56 arbiter corosync-qnetd[6338]:
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-lms: quorum node list from 
node 2 partition (2.801)
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-util: partition (2.801) 
(0x55a6fc669dc0) has 1 nodes
May 01 23:30:56 arbiter corosync-qnetd[6338]: algo-lms: Only 1 partition. This 
is votequorum's problem, not ours
May 01 23:30:56 arbiter corosync-qnetd[6338]: Algorithm result vote is ACK
May 01 23:30:58 arbiter corosync-qnetd[6338]: Client closed connection
May 01 23:30:58 arbiter corosync-qnetd[6338]: Client ::ffff:192.168.100.2:57674 
(init_received 0, cluster bsc-test-cluster, node_id 0) disconnect
>> At this point resource start on backup host

>From qdevice:

May 01 23:30:40 node2 corosync-qdevice[781]: Server closed connection
May 01 23:30:40 node2 corosync-qdevice[781]: algo-lms: disconnected. quorate = 
1, WFA = 0
May 01 23:30:40 node2 corosync-qdevice[781]: algo-lms: disconnected. reason = 
22, WFA = 0
May 01 23:30:40 node2 corosync-qdevice[781]: Algorithm result vote is NACK
May 01 23:30:40 node2 corosync-qdevice[781]: Cast vote timer remains scheduled 
every 250ms voting NACK.
May 01 23:30:40 node2 corosync-qdevice[781]: Sleeping for 161 ms before 
reconnect
May 01 23:30:40 node2 corosync-qdevice[781]: Trying connect to qnetd server 
arbiter:5403 (timeout = 400ms)
May 01 23:30:41 node2 corosync-qdevice[781]: Connect timeout
May 01 23:30:41 node2 corosync-qdevice[781]: algo-lms: disconnected. quorate = 
1, WFA = 0
May 01 23:30:41 node2 corosync-qdevice[781]: algo-lms: disconnected. reason = 
27, WFA = 0
May 01 23:30:41 node2 corosync-qdevice[781]: Algorithm result vote is NACK
May 01 23:30:41 node2 corosync-qdevice[781]: Cast vote timer remains scheduled 
every 250ms voting NACK.
May 01 23:30:41 node2 corosync-qdevice[781]: Trying connect to qnetd server 
arbiter:5403 (timeout = 400ms)
May 01 23:30:41 node2 corosync-qdevice[781]: Votequorum nodelist notify 
callback:
May 01 23:30:41 node2 corosync-qdevice[781]:   Ring_id = (2.801)
May 01 23:30:41 node2 corosync-qdevice[781]:   Node list (size = 1):
May 01 23:30:41 node2 corosync-qdevice[781]:     0 nodeid = 2
May 01 23:30:41 node2 corosync-qdevice[781]: Algorithm decided to pause cast 
vote timer and result vote is No change
May 01 23:30:41 node2 corosync-qdevice[781]: Cast vote timer is now paused.
May 01 23:30:41 node2 corosync-qdevice[781]: worker: 
qdevice_heuristics_worker_cmd_process_exec: Received exec command with seq_no 
"24" and timeout "1500"
May 01 23:30:41 node2 corosync-qdevice[781]: Received heuristics exec result 
command with seq_no "24" and result "Disabled"
May 01 23:30:41 node2 corosync-qdevice[781]: Votequorum heuristics exec result 
callback:
May 01 23:30:41 node2 corosync-qdevice[781]:   seq_number = 24, exec_result = 
Disabled
May 01 23:30:41 node2 corosync-qdevice[781]: Algorithm decided to not send 
list, result vote is No change and heuristics is Undefined
May 01 23:30:41 node2 corosync-qdevice[781]: Cast vote timer is no longer 
paused.
May 01 23:30:41 node2 corosync-qdevice[781]: Not scheduling heuristics timer 
because mode is not enabled
May 01 23:30:41 node2 corosync-qdevice[781]: Votequorum quorum notify callback:
May 01 23:30:41 node2 corosync-qdevice[781]:   Quorate = 0
May 01 23:30:41 node2 corosync-qdevice[781]:   Node list (size = 3):
May 01 23:30:41 node2 corosync-qdevice[781]:     0 nodeid = 1, state = 2
May 01 23:30:41 node2 corosync-qdevice[781]:     1 nodeid = 2, state = 1
May 01 23:30:41 node2 corosync-qdevice[781]:     2 nodeid = 0, state = 0
May 01 23:30:41 node2 corosync-qdevice[781]: algo-lms: quorum_notify. quorate = 0
May 01 23:30:41 node2 corosync-qdevice[781]: Algorithm decided to not send list 
and result vote is No change
May 01 23:30:41 node2 corosync-qdevice[781]: Connect timeout
May 01 23:30:41 node2 corosync-qdevice[781]: algo-lms: disconnected. quorate = 
0, WFA = 0
May 01 23:30:41 node2 corosync-qdevice[781]: algo-lms: disconnected. reason = 
27, WFA = 0
May 01 23:30:41 node2 corosync-qdevice[781]: Algorithm result vote is NACK
May 01 23:30:41 node2 corosync-qdevice[781]: Cast vote timer remains scheduled 
every 250ms voting NACK.
May 01 23:30:41 node2 corosync-qdevice[781]: Trying connect to qnetd server 
arbiter:5403 (timeout = 400ms)
>>> At this point quorum reported lost
May 01 23:30:41 node2 corosync-qdevice[781]: Connect timeout
May 01 23:30:41 node2 corosync-qdevice[781]: algo-lms: disconnected. quorate = 
0, WFA = 0
May 01 23:30:41 node2 corosync-qdevice[781]: algo-lms: disconnected. reason = 
27, WFA = 0
May 01 23:30:41 node2 corosync-qdevice[781]: Algorithm result vote is NACK
May 01 23:30:41 node2 corosync-qdevice[781]: Cast vote timer remains scheduled 
every 250ms voting NACK.

>>> This failure pattern repeats 31 times
May 01 23:30:41 node2 corosync-qdevice[781]: Trying connect to qnetd server 
arbiter:5403 (timeout = 400ms)
May 01 23:30:42 node2 corosync-qdevice[781]: Connect timeout
May 01 23:30:42 node2 corosync-qdevice[781]: algo-lms: disconnected. quorate = 
0, WFA = 0
May 01 23:30:42 node2 corosync-qdevice[781]: algo-lms: disconnected. reason = 
27, WFA = 0
May 01 23:30:42 node2 corosync-qdevice[781]: Algorithm result vote is NACK
May 01 23:30:42 node2 corosync-qdevice[781]: Cast vote timer remains scheduled 
every 250ms voting NACK.
>>> End of pattern repeat, continue

May 01 23:30:56 node2 corosync-qdevice[781]: Trying connect to qnetd server 
arbiter:5403 (timeout = 400ms)
May 01 23:30:56 node2 corosync-qdevice[781]: Sending preinit msg to qnetd
May 01 23:30:56 node2 corosync-qdevice[781]: Received preinit reply msg
May 01 23:30:56 node2 corosync-qdevice[781]: Received init reply msg
May 01 23:30:56 node2 corosync-qdevice[781]: Scheduling send of heartbeat every 
400ms
May 01 23:30:56 node2 corosync-qdevice[781]: Executing after-connect heuristics.
May 01 23:30:56 node2 corosync-qdevice[781]: worker: 
qdevice_heuristics_worker_cmd_process_exec: Received exec command with seq_no 
"25" and timeout "250"
May 01 23:30:56 node2 corosync-qdevice[781]: Received heuristics exec result 
command with seq_no "25" and result "Disabled"
May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm decided to send config 
node list, send membership node list, send quorum node list, heuristics is 
Undefined and result vote is Wait for reply
May 01 23:30:56 node2 corosync-qdevice[781]: Sending set option seq = 98, HB(0) 
= 0ms, KAP Tie-breaker(1) = Enabled
May 01 23:30:56 node2 corosync-qdevice[781]: Sending config node list seq = 99
May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
May 01 23:30:56 node2 corosync-qdevice[781]:     0 node_id = 1, data_center_id 
= 0, node_state = not set
May 01 23:30:56 node2 corosync-qdevice[781]:     1 node_id = 2, data_center_id 
= 0, node_state = not set
May 01 23:30:56 node2 corosync-qdevice[781]: Sending membership node list seq = 
100, ringid = (2.801), heuristics = Undefined.
May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
May 01 23:30:56 node2 corosync-qdevice[781]:     0 node_id = 2, data_center_id 
= 0, node_state = not set
May 01 23:30:56 node2 corosync-qdevice[781]: Sending quorum node list seq = 
101, quorate = 0
May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
May 01 23:30:56 node2 corosync-qdevice[781]:     0 node_id = 1, data_center_id 
= 0, node_state = dead
May 01 23:30:56 node2 corosync-qdevice[781]:     1 node_id = 2, data_center_id 
= 0, node_state = member
May 01 23:30:56 node2 corosync-qdevice[781]: Cast vote timer is now stopped.
May 01 23:30:56 node2 corosync-qdevice[781]: Received set option reply seq(1) = 
98, HB(0) = 0ms, KAP Tie-breaker(1) = Enabled
May 01 23:30:56 node2 corosync-qdevice[781]: Received initial config node list 
reply
May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 99
May 01 23:30:56 node2 corosync-qdevice[781]:   vote = No change
May 01 23:30:56 node2 corosync-qdevice[781]:   ring id = (2.801)
May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm result vote is No change
May 01 23:30:56 node2 corosync-qdevice[781]: Received membership node list reply
May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 100
May 01 23:30:56 node2 corosync-qdevice[781]:   vote = ACK
May 01 23:30:56 node2 corosync-qdevice[781]:   ring id = (2.801)
May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm result vote is ACK
May 01 23:30:56 node2 corosync-qdevice[781]: Cast vote timer is now scheduled 
every 250ms voting ACK.
May 01 23:30:56 node2 corosync-qdevice[781]: Received quorum node list reply
May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 101
May 01 23:30:56 node2 corosync-qdevice[781]:   vote = ACK
May 01 23:30:56 node2 corosync-qdevice[781]:   ring id = (2.801)
May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm result vote is ACK
May 01 23:30:56 node2 corosync-qdevice[781]: Cast vote timer remains scheduled 
every 250ms voting ACK.
May 01 23:30:56 node2 corosync-qdevice[781]: Votequorum quorum notify callback:
May 01 23:30:56 node2 corosync-qdevice[781]:   Quorate = 1
May 01 23:30:56 node2 corosync-qdevice[781]:   Node list (size = 3):
May 01 23:30:56 node2 corosync-qdevice[781]:     0 nodeid = 1, state = 2
May 01 23:30:56 node2 corosync-qdevice[781]:     1 nodeid = 2, state = 1
May 01 23:30:56 node2 corosync-qdevice[781]:     2 nodeid = 0, state = 0
May 01 23:30:56 node2 corosync-qdevice[781]: algo-lms: quorum_notify. quorate = 
1
May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm decided to send list and 
result vote is No change
May 01 23:30:56 node2 corosync-qdevice[781]: Sending quorum node list seq = 
102, quorate = 1
May 01 23:30:56 node2 corosync-qdevice[781]:   Node list:
May 01 23:30:56 node2 corosync-qdevice[781]:     0 node_id = 1, data_center_id 
= 0, node_state = dead
May 01 23:30:56 node2 corosync-qdevice[781]:     1 node_id = 2, data_center_id 
= 0, node_state = member
May 01 23:30:56 node2 corosync-qdevice[781]: Received quorum node list reply
May 01 23:30:56 node2 corosync-qdevice[781]:   seq = 102
May 01 23:30:56 node2 corosync-qdevice[781]:   vote = ACK
May 01 23:30:56 node2 corosync-qdevice[781]:   ring id = (2.801)
May 01 23:30:56 node2 corosync-qdevice[781]: Algorithm result vote is ACK
May 01 23:30:56 node2 corosync-qdevice[781]: Cast vote timer remains scheduled 
every 250ms voting ACK.
>>> Here everything become OK and resource started on Node2

Also, I've done wireshark capture and found great mess in TCP, it seems like 
connection between qdevice and qnetd really stops for some time and packets 
won't deliver.

For my guess, it match corosync syncing activities, and I suspect that corosync 
prevent any other traffic on the interface it use for rings.

As I switch qnetd and qdevice to use different interface it seems to work fine.

So, the question is: does corosync really temporary blocks any other traffic on 
the interface it uses? Or it is just a coincidence? If it blocks, is there a 
way to manage it?

Thank you for any suggest on that!

Sincerely,

Alex

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [EXT] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

Reply via email to