On 4/7/20 4:09 AM, Jan Friesse wrote:
Sherrard and Andrei


On 4/6/20 4:10 PM, Andrei Borzenkov wrote:
06.04.2020 20:57, Sherrard Burton пишет:

It looks like some timing issue or race condition. After reboot node
manages to contact qnetd first, before connection to other node is
established. Qnetd behaves as documented - it sees two equal size
partitions and favors the partition that includes tie breaker (lowest
node id). So existing node goes out of quorum. Second later both nodes
see each other and so quorum is regained.

Nice catch



thank you for taking the time to troll through my debugging output. your explanation seems to accurately describe what i am experiencing. of course i have no idea how to remedy it. :-)

It is really quite a problem. Honestly, I don't think there is really a way how to remedy this behavior other than implement option to prefer active partition as a tie-breaker (https://github.com/corosync/corosync-qdevice/issues/7).

Jan,
my curiosity got the best of me, so i spent some time trying to orient myself to the inner workings of ffsplit.

a) how would one identify the current active partition? i might be starting too far in (or missing something), but it seems that by the time we are in qnetd_algo_ffsplit_partition_cmp(), we are comparing two sets of clients and node lists without the kind of context that would allow us to identify the current active partition. i could not easily identify the object that we would interrogate to answer that question.

b) is it possible to manage client->tie_breaker.mode and client->tie_breaker.node_id dynamically to achieve the desired goal? ie, if we are in a two-node cluster and one node leaves, can we "push" values to the remaining client such that client->tie_breaker.mode == TLV_TIE_BREAKER_MODE_NODE_ID and client->tie_breaker.node_id == client->node_id?

of course i may be way off base with all of this. just wanted to ask before i extracted myself from the rabbit hole.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to