- **status**: review --> fixed
- **Comment**:
develop:
commit f921600fa2affd69e898a8beb0848c75924cfae1
Author: Gary Lee <[email protected]>
Date: Tue Aug 29 13:42:50 2017 +1000
amfd: postpone deletion of node from node_id_db [#2547]
CLM and MDS callbacks are delivered to the main thread via different paths.
If a node is restarted quickly, sometimes CLM JOIN is processed before the
prior MDS down. This means the node will not be able to join the cluster
as it is not in node_id_db (deleted in MDS down processing).
This patch ensures addition to, and removal from node_id_db is only done
from CLM callbacks to avoid race conditions such as above.
---
** [tickets:#2547] amfd: payload cannot join cluster**
**Status:** fixed
**Milestone:** 5.17.10
**Created:** Wed Aug 09, 2017 12:41 AM UTC by Gary Lee
**Last Updated:** Mon Aug 14, 2017 03:49 AM UTC
**Owner:** Gary Lee
If a payload is stopped and restarted quickly, sometimes it will not be able to
re-join the cluster.
CLM and MDS events are sent to the main thread in separate pathways. Here we
can see a MDS DOWN event arriving out of order, after CLM JOIN.
~~~
Jul 27 11:45:15.259963 osafamfd [264:264:src/clm/agent/clma_api.c:0829] >>
saClmDispatch
Jul 27 11:45:15.260082 osafamfd [264:264:src/amf/amfd/clm.cc:0222] >>
clm_track_cb: '0' '4' '1'
Jul 27 11:45:15.260103 osafamfd [264:264:src/amf/amfd/clm.cc:0238] TR
numberOfMembers:'4', numberOfItems:'1'
Jul 27 11:45:15.260121 osafamfd [264:264:src/amf/amfd/clm.cc:0244] TR i = 0,
node:'safNode=PL-4,safCluster=myClmCluster', clusterChange:3
Jul 27 11:45:15.260133 osafamfd [264:264:src/amf/amfd/clm.cc:0299] TR Node
Left: rootCauseEntity safNode=PL-4,safCluster=myClmCluster for node 132111
Jul 27 11:45:15.279492 osafamfd [264:264:src/clm/agent/clma_api.c:0829] >>
saClmDispatch
Jul 27 11:45:15.279574 osafamfd [264:264:src/amf/amfd/clm.cc:0222] >>
clm_track_cb: '0' '4' '1'
Jul 27 11:45:15.279581 osafamfd [264:264:src/amf/amfd/clm.cc:0238] TR
numberOfMembers:'5', numberOfItems:'1'
Jul 27 11:45:15.279589 osafamfd [264:264:src/amf/amfd/clm.cc:0244] TR i = 0,
node:'safNode=PL-4,safCluster=myClmCluster', clusterChange:2
Jul 27 11:45:15.279609 osafamfd [264:264:src/amf/amfd/node.cc:0052] TR added
node 132111
Jul 27 11:45:15.279620 osafamfd [264:264:src/amf/amfd/clm.cc:0380] TR Node
Joined 'safNode=PL-4,safCluster=myClmCluster' '36'
Jul 27 11:45:15.287973 osafamfd [264:264:src/amf/amfd/main.cc:0770] >>
process_event: evt->rcv_evt 21
Jul 27 11:45:15.287979 osafamfd [264:264:src/amf/amfd/ndfsm.cc:0771] >>
avd_mds_avnd_down_evh: 2040f, 0x55c93b1dfda0
Jul 27 11:45:15.287986 osafamfd [264:264:src/amf/amfd/ndproc.cc:1219] >>
avd_node_failover: 'safAmfNode=PL-4,safAmfCluster=myAmfCluster'
Jul 27 11:45:15.287991 osafamfd [264:264:src/amf/amfd/ndfsm.cc:1110] >>
avd_node_mark_absent
Jul 27 11:45:15.785245 osafamfd [264:264:src/amf/amfd/ndfsm.cc:0296] >>
avd_node_up_evh: from 2040f, safAmfNode=PL-4,safAmfCluster=myAmfCluster
Jul 27 11:45:15.785261 osafamfd [264:264:src/amf/amfd/ndfsm.cc:0363] TR invalid
node ID (2040f)
~~~
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets