Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

Jonathan Davies Fri, 13 Oct 2017 07:09:55 -0700


On 12/10/17 11:54, Jan Friesse wrote:

I'm on corosync-2.3.4 plus my patch
Finally noticed ^^^ 2.3.4 is really old and as long as it is not somepatched version, I wouldn't recommend to use it. Can you give a try tocurrent needle?

I was mistaken to think I was on 2.3.4. Actually I am on the versionfrom CentOS 7.4 which is 2.4.0+patches.


I will try to reproduce it with needle.

But often at this point, cluster1's disappearance is not reflected in
the votequorum info on cluster2:


... Is this permanent (= until new node join/leave it , or it will fix
itself over (short) time? If this is permanent, it's a bug. If it
fixes itself it's result of votequorum not being virtual synchronous.


Yes, it's permanent. After several minutes of waiting, votequorum still
reports "total votes: 2" even though there's only one member.



That's bad. I've tried following setup:

- Both nodes with current needle
- Your config
- Second node is just running corosync
- First node is running following command:

while true;do corosync -f; ssh node2 'corosync-quorumtool | grepTotal | grep 1' || exit 1;done

Running it for quite a while and I'm unable to reproduce the bug. SadlyI'm unable to reproduce the bug even with 2.3.4. Do you think thatreproducer is correct?

Yes, that's similar enough to what I'm doing. The bug happens about 50%of the time for me, so I do it manually rather than needing a loop. SoI'm not sure why you can't reproduce it.

I've done a bit of digging and am getting closer to the root cause ofthe race.

We rely on having votequorum_sync_init called twice -- once when node 1joins (with member_list_entries=2) and once when node 1 leaves (withmember_list_entries=1). This is important because votequorum_sync_initmarks nodes as NODESTATE_DEAD if they are not in quorum_members[] -- soit needs to have seen the node appear then disappear. This is importantbecause get_total_votes only counts votes from nodes in stateNODESTATE_MEMBER.

When it goes wrong, I see that votequorum_sync_init is only called*once* (with member_list_entries=1) -- after node 1 has joined and left.So it never sees node 1 in member_list, hence never marks it asNODESTATE_DEAD. But message_handler_req_exec_votequorum_nodeinfo hasindepedently marked the node as NODESTATE_MEMBER, hence get_total_votescounts it and quorate is set to 1.

So why is votequorum_sync_init sometimes only called once? It looks likeit's all down to whether we manage to iterate through all the calls toschedwrk_processor before entering the OPERATIONAL state. I haven't yetlooked into exactly what controls the timing of these two things.


Adding the following patch helps me to demonstrate the problem more clearly:

diff --git a/exec/sync.c b/exec/sync.c
index e7b71bd..a2fb06d 100644
--- a/exec/sync.c
+++ b/exec/sync.c
@@ -544,6 +545,7 @@ static int schedwrk_processor (const void *context)
                }

if(my_sync_callbacks_retrieve(my_service_list[my_processing_idx].service_id,NULL) != -1) {+ log_printf(LOGSYS_LEVEL_NOTICE, "callingsync_init on service '%s' (%d) with my_member_list_entries = %d",my_service_list[my_processing_idx].name, my_processing_idx,my_member_list_entries);my_service_list[my_processing_idx].sync_init(my_trans_list,

                                my_trans_list_entries, my_member_list,
                                my_member_list_entries,
diff --git a/exec/votequorum.c b/exec/votequorum.c
index d5f06c1..aab6c15 100644
--- a/exec/votequorum.c
+++ b/exec/votequorum.c
@@ -2336,6 +2353,8 @@ static void votequorum_sync_init (
        int left_nodes;
        struct cluster_node *node;

+ log_printf(LOGSYS_LEVEL_NOTICE, "votequorum_sync_init has %dmember_list_entries", member_list_entries);

+
        ENTER();

        sync_in_progress = 1;

When it works correctly I see the following (selected log lines):

notice [TOTEM ] A new membership (10.71.218.17:2016) was formed.Members joined: 1notice [SYNC ] calling sync_init on service 'corosync configurationmap access' (0) with my_member_list_entries = 2notice [SYNC ] calling sync_init on service 'corosync cluster closedprocess group service v1.01' (1) with my_member_list_entries = 2notice [SYNC ] calling sync_init on service 'corosync vote quorumservice v1.0' (2) with my_member_list_entries = 2

notice  [VOTEQ ] votequorum_sync_init has 2 member_list_entries

notice [TOTEM ] A new membership (10.71.218.18:2020) was formed.Members left: 1notice [SYNC ] calling sync_init on service 'corosync configurationmap access' (0) with my_member_list_entries = 1notice [SYNC ] calling sync_init on service 'corosync cluster closedprocess group service v1.01' (1) with my_member_list_entries = 1notice [SYNC ] calling sync_init on service 'corosync vote quorumservice v1.0' (2) with my_member_list_entries = 1

notice  [VOTEQ ] votequorum_sync_init has 1 member_list_entries

-- Notice that votequorum_sync_init is called once with 2 members andonce with 1 member.


When it goes wrong I see the following (selected log lines):

notice [TOTEM ] A new membership (10.71.218.17:2004) was formed.Members joined: 1notice [SYNC ] calling sync_init on service 'corosync configurationmap access' (0) with my_member_list_entries = 2notice [SYNC ] calling sync_init on service 'corosync cluster closedprocess group service v1.01' (1) with my_member_list_entries = 2notice [TOTEM ] A new membership (10.71.218.18:2008) was formed.Members left: 1notice [SYNC ] calling sync_init on service 'corosync configurationmap access' (0) with my_member_list_entries = 1notice [SYNC ] calling sync_init on service 'corosync cluster closedprocess group service v1.01' (1) with my_member_list_entries = 1notice [SYNC ] calling sync_init on service 'corosync vote quorumservice v1.0' (2) with my_member_list_entries = 1

notice  [VOTEQ ] votequorum_sync_init has 1 member_list_entries

-- Notice the value of my_member_list_entries in the differentsync_init calls, and that votequorum_sync_init is only called once.


Does this help explain the issue?

Thanks,
Jonathan

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

Reply via email to