Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

Jan Friesse Wed, 08 Apr 2020 00:13:54 -0700

Sherrard,

i could not determine which of these sub-threads to include this in, soi am going to (reluctantly) top-post it.
i switched the transport to udp, and in limited testing i seem to not behitting the race condition. of course i have no idea whether this willbehave consistently, or which part of the knet vs udp setup makes themost difference.
ie, is it the overhead of the crypto handshakes/setup? is there someother knet layer that imparts additional delay in establishingconnection to other nodes? is the delay on the rebooted node, thestanding node, or both?


Very high level, what is happening in corosync when using udpu:

- Corosync started and begins in gather state -> sends "multicast"(emulated by unicast to all expected members) message telling "I'm hereand this is my view of live nodes").

- In this state, corosync waits for answers

- When node receives this message it "multicast" same message withupdated view of live nodes- After all nodes agrees, they move to next state (commit/recovery andfinally operational)

With udp, this happens instantly so most of the time corosync doesn'teven create single node membership, which would be created if no othernodes exists and/or replies wouldn't be delivered on time.

Knet adds a layer which monitors links between each of the node and itwill make line active after it received configured number of "pong"packets. Idea behind is to have evidence of reasonable stable line. Aslong as line is not active no data packet goes thru (corosync traffic isjust "data"). This basically means, that initial corosync multicast isnot delivered to other nodes so corosync creates single node membership.After line becomes active "multicast" is delivered to other nodes andthey move to gather state.

So to answer you question. "Delay" is on both nodes side because link isnot established between the nodes.


Honza

ultimate i have to remind myself that "a race condition is a racecondition", and that you can't chase micro-second improvements that maylessen the chance of triggering it. you have to solve the underlyingproblem.
thanks again folks, for your help, and the great work you are doing.


On 4/7/20 4:09 AM, Jan Friesse wrote:
Sherrard and Andrei
On 4/6/20 4:10 PM, Andrei Borzenkov wrote:
06.04.2020 20:57, Sherrard Burton пишет:
On 4/6/20 1:20 PM, Sherrard Burton wrote:
On 4/6/20 12:35 PM, Andrei Borzenkov wrote:
06.04.2020 17:05, Sherrard Burton пишет:
from the quorum node:
...
Apr 05 23:10:17 debug   Client ::ffff:192.168.250.50:54462 (cluster
xen-nfs01_xen-nfs02, node_id 1) sent quorum node list.
Apr 05 23:10:17 debug     msg seq num = 6
Apr 05 23:10:17 debug     quorate = 0
Apr 05 23:10:17 debug     node list:
Apr 05 23:10:17 debug node_id = 1, data_center_id = 0,node_state
= member
Oops. How comes that node that was rebooted formed cluster all by
itself, without seeing the second node? Do you have two_nodes and/or
wait_for_all configured?
i never thought to check the logs on the rebooted server. hopefully
someone can extract some further useful information here:


https://pastebin.com/imnYKBMN
It looks like some timing issue or race condition. After reboot node
manages to contact qnetd first, before connection to other node is
established. Qnetd behaves as documented - it sees two equal size
partitions and favors the partition that includes tie breaker (lowest
node id). So existing node goes out of quorum. Second later both nodes
see each other and so quorum is regained.
Nice catch
thank you for taking the time to troll through my debugging output.your explanation seems to accurately describe what i am experiencing.of course i have no idea how to remedy it. :-)
It is really quite a problem. Honestly, I don't think there is reallya way how to remedy this behavior other than implement option toprefer active partition as a tie-breaker(https://github.com/corosync/corosync-qdevice/issues/7).
I cannot reproduce it, but I also do not use knet. From documentation I
have impression that knet has artificial delay before it considerslinks
operational, so may be that is the reason.
i will do some reading on how knet factors into all of this andrespond with any questions or discoveries.
knet_pong_count/knet_ping_interval tuning may help, but I don't thinkthere is really a way to prevent creation of single node membership inall possible cases.
BTW, great eyes. i had not picked up on that little nuance. i had
poured through this particular log a number of times, but it was very
hard for me to discern the starting and stopping points for each
logical group of messages. the indentation made some of it clear. but
when you have a series of lines beginning in the left-most column, it
is not clear whether they belong to the previous group, the next
group, or they are their own group.

just wanted to note my confusion in case the relevant maintainer
happens across this thread.
Here :)
Output (especially debug one) is really a bit cryptic, but I'm notentirely sure how to make it better. Qnetd events have no strictordering so I don't see a way ho to group relevant events without somekind of reordering and best guessing, what I'm not too keen to do.Also some of the messages relates to specific nodes and some of themessages relates to whole cluster (or part of the cluster).
Of course I'm open to ideas how to structure it better way.

Regards,
   Honza
thanks again
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

Reply via email to