- **status**: review --> fixed
- **Comment**:

develop:

commit f921600fa2affd69e898a8beb0848c75924cfae1
Author: Gary Lee <[email protected]>
Date:   Tue Aug 29 13:42:50 2017 +1000

    amfd: postpone deletion of node from node_id_db [#2547]
    
    CLM and MDS callbacks are delivered to the main thread via different paths.
    If a node is restarted quickly, sometimes CLM JOIN is processed before the
    prior MDS down. This means the node will not be able to join the cluster
    as it is not in node_id_db (deleted in MDS down processing).
    
    This patch ensures addition to, and removal from node_id_db is only done
    from CLM callbacks to avoid race conditions such as above.



---

** [tickets:#2547] amfd: payload cannot join cluster**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Wed Aug 09, 2017 12:41 AM UTC by Gary Lee
**Last Updated:** Mon Aug 14, 2017 03:49 AM UTC
**Owner:** Gary Lee


If a payload is stopped and restarted quickly, sometimes it will not be able to 
re-join the cluster.

CLM and MDS events are sent to the main thread in separate pathways. Here we 
can see a MDS DOWN event arriving out of order, after CLM JOIN.

~~~
Jul 27 11:45:15.259963 osafamfd [264:264:src/clm/agent/clma_api.c:0829] >> 
saClmDispatch
Jul 27 11:45:15.260082 osafamfd [264:264:src/amf/amfd/clm.cc:0222] >> 
clm_track_cb: '0' '4' '1'
Jul 27 11:45:15.260103 osafamfd [264:264:src/amf/amfd/clm.cc:0238] TR 
numberOfMembers:'4', numberOfItems:'1'
Jul 27 11:45:15.260121 osafamfd [264:264:src/amf/amfd/clm.cc:0244] TR i = 0, 
node:'safNode=PL-4,safCluster=myClmCluster', clusterChange:3
Jul 27 11:45:15.260133 osafamfd [264:264:src/amf/amfd/clm.cc:0299] TR  Node 
Left: rootCauseEntity safNode=PL-4,safCluster=myClmCluster for node 132111

Jul 27 11:45:15.279492 osafamfd [264:264:src/clm/agent/clma_api.c:0829] >> 
saClmDispatch
Jul 27 11:45:15.279574 osafamfd [264:264:src/amf/amfd/clm.cc:0222] >> 
clm_track_cb: '0' '4' '1'
Jul 27 11:45:15.279581 osafamfd [264:264:src/amf/amfd/clm.cc:0238] TR 
numberOfMembers:'5', numberOfItems:'1'
Jul 27 11:45:15.279589 osafamfd [264:264:src/amf/amfd/clm.cc:0244] TR i = 0, 
node:'safNode=PL-4,safCluster=myClmCluster', clusterChange:2
Jul 27 11:45:15.279609 osafamfd [264:264:src/amf/amfd/node.cc:0052] TR added 
node 132111
Jul 27 11:45:15.279620 osafamfd [264:264:src/amf/amfd/clm.cc:0380] TR Node 
Joined 'safNode=PL-4,safCluster=myClmCluster' '36'

Jul 27 11:45:15.287973 osafamfd [264:264:src/amf/amfd/main.cc:0770] >> 
process_event: evt->rcv_evt 21
Jul 27 11:45:15.287979 osafamfd [264:264:src/amf/amfd/ndfsm.cc:0771] >> 
avd_mds_avnd_down_evh: 2040f, 0x55c93b1dfda0
Jul 27 11:45:15.287986 osafamfd [264:264:src/amf/amfd/ndproc.cc:1219] >> 
avd_node_failover: 'safAmfNode=PL-4,safAmfCluster=myAmfCluster'
Jul 27 11:45:15.287991 osafamfd [264:264:src/amf/amfd/ndfsm.cc:1110] >> 
avd_node_mark_absent

Jul 27 11:45:15.785245 osafamfd [264:264:src/amf/amfd/ndfsm.cc:0296] >> 
avd_node_up_evh: from 2040f, safAmfNode=PL-4,safAmfCluster=myAmfCluster
Jul 27 11:45:15.785261 osafamfd [264:264:src/amf/amfd/ndfsm.cc:0363] TR invalid 
node ID (2040f)
~~~



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to