On 12/10/17 11:54, Jan Friesse wrote:
I'm on corosync-2.3.4 plus my patch

Finally noticed ^^^ 2.3.4 is really old and as long as it is not some patched version, I wouldn't recommend to use it. Can you give a try to current needle?

I was mistaken to think I was on 2.3.4. Actually I am on the version from CentOS 7.4 which is 2.4.0+patches.

I will try to reproduce it with needle.

But often at this point, cluster1's disappearance is not reflected in
the votequorum info on cluster2:

... Is this permanent (= until new node join/leave it , or it will fix
itself over (short) time? If this is permanent, it's a bug. If it
fixes itself it's result of votequorum not being virtual synchronous.

Yes, it's permanent. After several minutes of waiting, votequorum still
reports "total votes: 2" even though there's only one member.


That's bad. I've tried following setup:

- Both nodes with current needle
- Your config
- Second node is just running corosync
- First node is running following command:
  while true;do corosync -f; ssh node2 'corosync-quorumtool | grep Total | grep 1' || exit 1;done

Running it for quite a while and I'm unable to reproduce the bug. Sadly I'm unable to reproduce the bug even with 2.3.4. Do you think that reproducer is correct?

Yes, that's similar enough to what I'm doing. The bug happens about 50% of the time for me, so I do it manually rather than needing a loop. So I'm not sure why you can't reproduce it.

I've done a bit of digging and am getting closer to the root cause of the race.

We rely on having votequorum_sync_init called twice -- once when node 1 joins (with member_list_entries=2) and once when node 1 leaves (with member_list_entries=1). This is important because votequorum_sync_init marks nodes as NODESTATE_DEAD if they are not in quorum_members[] -- so it needs to have seen the node appear then disappear. This is important because get_total_votes only counts votes from nodes in state NODESTATE_MEMBER.

When it goes wrong, I see that votequorum_sync_init is only called *once* (with member_list_entries=1) -- after node 1 has joined and left. So it never sees node 1 in member_list, hence never marks it as NODESTATE_DEAD. But message_handler_req_exec_votequorum_nodeinfo has indepedently marked the node as NODESTATE_MEMBER, hence get_total_votes counts it and quorate is set to 1.

So why is votequorum_sync_init sometimes only called once? It looks like it's all down to whether we manage to iterate through all the calls to schedwrk_processor before entering the OPERATIONAL state. I haven't yet looked into exactly what controls the timing of these two things.

Adding the following patch helps me to demonstrate the problem more clearly:

diff --git a/exec/sync.c b/exec/sync.c
index e7b71bd..a2fb06d 100644
--- a/exec/sync.c
+++ b/exec/sync.c
@@ -544,6 +545,7 @@ static int schedwrk_processor (const void *context)
                }

if (my_sync_callbacks_retrieve(my_service_list[my_processing_idx].service_id, NULL) != -1) { + log_printf(LOGSYS_LEVEL_NOTICE, "calling sync_init on service '%s' (%d) with my_member_list_entries = %d", my_service_list[my_processing_idx].name, my_processing_idx, my_member_list_entries); my_service_list[my_processing_idx].sync_init (my_trans_list,
                                my_trans_list_entries, my_member_list,
                                my_member_list_entries,
diff --git a/exec/votequorum.c b/exec/votequorum.c
index d5f06c1..aab6c15 100644
--- a/exec/votequorum.c
+++ b/exec/votequorum.c
@@ -2336,6 +2353,8 @@ static void votequorum_sync_init (
        int left_nodes;
        struct cluster_node *node;

+ log_printf(LOGSYS_LEVEL_NOTICE, "votequorum_sync_init has %d member_list_entries", member_list_entries);
+
        ENTER();

        sync_in_progress = 1;

When it works correctly I see the following (selected log lines):

notice [TOTEM ] A new membership (10.71.218.17:2016) was formed. Members joined: 1 notice [SYNC ] calling sync_init on service 'corosync configuration map access' (0) with my_member_list_entries = 2 notice [SYNC ] calling sync_init on service 'corosync cluster closed process group service v1.01' (1) with my_member_list_entries = 2 notice [SYNC ] calling sync_init on service 'corosync vote quorum service v1.0' (2) with my_member_list_entries = 2
notice  [VOTEQ ] votequorum_sync_init has 2 member_list_entries
notice [TOTEM ] A new membership (10.71.218.18:2020) was formed. Members left: 1 notice [SYNC ] calling sync_init on service 'corosync configuration map access' (0) with my_member_list_entries = 1 notice [SYNC ] calling sync_init on service 'corosync cluster closed process group service v1.01' (1) with my_member_list_entries = 1 notice [SYNC ] calling sync_init on service 'corosync vote quorum service v1.0' (2) with my_member_list_entries = 1
notice  [VOTEQ ] votequorum_sync_init has 1 member_list_entries

-- Notice that votequorum_sync_init is called once with 2 members and once with 1 member.

When it goes wrong I see the following (selected log lines):

notice [TOTEM ] A new membership (10.71.218.17:2004) was formed. Members joined: 1 notice [SYNC ] calling sync_init on service 'corosync configuration map access' (0) with my_member_list_entries = 2 notice [SYNC ] calling sync_init on service 'corosync cluster closed process group service v1.01' (1) with my_member_list_entries = 2 notice [TOTEM ] A new membership (10.71.218.18:2008) was formed. Members left: 1 notice [SYNC ] calling sync_init on service 'corosync configuration map access' (0) with my_member_list_entries = 1 notice [SYNC ] calling sync_init on service 'corosync cluster closed process group service v1.01' (1) with my_member_list_entries = 1 notice [SYNC ] calling sync_init on service 'corosync vote quorum service v1.0' (2) with my_member_list_entries = 1
notice  [VOTEQ ] votequorum_sync_init has 1 member_list_entries

-- Notice the value of my_member_list_entries in the different sync_init calls, and that votequorum_sync_init is only called once.

Does this help explain the issue?

Thanks,
Jonathan

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to