Hi Minh,

I will go thorugh it today.

Thanks
Praveen


---

** [tickets:#2466] AMF: NodeGroup Admin UNLOCK timeout during cluster start up**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Tue May 23, 2017 01:19 AM UTC by Minh Hon Chau
**Last Updated:** Tue May 23, 2017 05:13 AM UTC
**Owner:** nobody


When cluster is coming up, if a nodegroup admin op UNLOCK is issued (by SMF in 
this case), the nodegroup admin op can be timed out, because the 
su_cnt_admin_oper of one of PLs remains 1 forever

Sequence in details:
- A cluster has 4 nodes, start cluster
- When 3 nodes (SC1, SC2, PL3) join cluster, admin unlock nodegroup issue
~~~
May 22 14:33:46.665539 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-1' joined 
the cluster
May 22 14:33:48.115919 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-2' joined 
the cluster
May 22 14:34:00.442633 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'PL-4' joined 
the cluster
~~~

  NoRed Opensaf SU of PL4 get assigned

~~~
May 22 14:34:00.637324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2040f, act:2, 
'safSu=19781416d5,safSg=NoRed,safApp=OpenSAF', 'safSi=NoRed3,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

   admin unlock nodegroup issues

~~~
 May 22 14:34:02.989761 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/nodegroup.cc:1100] >> ng_admin_op_cb: 
'safAmfNodeGroup=smfLockAdmNg2,safAmfCluster=myAmfCluster', inv:'115964117001', 
op:'1'
 ~~~
 
- When NoRed Opensaf SU of PL-3 becomes ENABLED, it starts assignment

~~~
 May 22 14:34:10.096324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0725] >> 
avd_su_oper_state_evh: id:29, node:2030f, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' state:1
 May 22 14:34:10.097537 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0305] >> su_insvc: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 0
 May 22 14:34:10.097549 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0111] >> avd_new_assgn_susi: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' 'safSi=a6b0d555f4,safApp=OpenSAF' 
state=1
May 22 14:34:10.097552 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/siass.cc:0440] >> avd_susi_create: 
safSu=PL-3,safSg=NoRed,safApp=OpenSAF safSi=a6b0d555f4,safApp=OpenSAF state=1
~~~

 The su_cnt_admin_oper of NoRed Opensaf SU is increased.
 
~~~
May 22 14:34:10.098839 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/util.cc:0978] << avd_snd_susi_msg 
May 22 14:34:10.098841 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0268] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:1
~~~

- When NoRed Opensaf SU get assigned

~~~
May 22 14:34:10.105283 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2030f, act:2, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 'safSi=a6b0d555f4,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

  but this su_cnt_admin_oper is not decreased

~~~
May 22 14:34:10.108143 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0000] << susi_success
May 22 14:34:10.108148 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2010f203defc2 node not ready for assignments
May 22 14:34:10.108153 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2020fc2b319b5 node not ready for assignments
May 22 14:34:10.108157 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0621] >> 
avd_nd_ncs_su_assigned 
May 22 14:34:10.108162 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/node.cc:0461] >> avd_node_state_set: 
'safAmfNode=PL-3,safAmfCluster=myAmfCluster' NCS_INIT => PRESENT
~~~

  At the end, su_cnt_admin_oper still remains 1.
  
  The application SU get assigned, the counter's always decreased
~~~
May 22 14:34:10.444624 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_2n_fsm.cc:2648] << susi_success: rc:1
May 22 14:34:10.444629 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1681] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:2
May 22 14:34:10.444632 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0358] >> 
process_su_si_response_for_ng: 
'safSu=PL-3,safSg=2N,safApp=ERIC-sv.SVScsvStreamer'
May 22 14:34:10.444640 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0457] << 
process_su_si_response_for_ng 
~~~
There is a check in avd_su_si_assign_evh(), that seems not to count Opensaf SU 
when decreased counter
...
      /* else admin oper still not complete */
    } else if ((su->sg_of_su->sg_ncs_spec == false) &&
               ((su->su_on_node->admin_ng != nullptr) ||
                (su->sg_of_su->ng_using_saAmfSGAdminState == true))) {
      AVD_AMF_NG *ng = su->su_on_node->admin_ng;
      // Got response from AMFND for assignments decrement su_cnt_admin_oper.
 ...
 
 In avd_new_assgn_susi(), this counter is increased only depends on @admin_ng 
(which means nodegroup issued) and regardless check of Opensaf SU
 ...
     if (avd_snd_susi_msg(cb, su, susi, AVSV_SUSI_ACT_ASGN, false, nullptr) ==
        NCSCC_RC_SUCCESS) {
      AVD_AVND *node = su->su_on_node;
      if ((node->admin_node_pend_cbk.invocation != 0) ||
          ((node->admin_ng != nullptr) &&
           (node->admin_ng->admin_ng_pend_cbk.invocation != 0))) {
        node->su_cnt_admin_oper++;
        TRACE("node:'%s', su_cnt_admin_oper:%u", node->name.c_str(),
              node->su_cnt_admin_oper);
        if (node->admin_ng != nullptr) {
          node->admin_ng->node_oper_list.insert(node->name);
          TRACE("node_oper_list size:%u", node->admin_ng->oper_list_size());
        }
 ...
 
This scenario makes upgrade failed at the step of UNLOCK nodegroup


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to