Hi Minh,

Earlier we used to reject admin operation on AMF entities if cluster is not in 
AVD_APP_STATE (cluster startup timer not expired). Such a check is not present 
in check_ng_stability() and thus above admin operation was allowed. 
Later we enhanced AMF to allow admin operation before cluster timer expiry (I 
need to check whether it was done before admin supoort on Nodegroup or after 
that). In such a case the entity is immediately marked in final state and admin 
operation is responded with success immediately and incrementing or 
decrementing any counters in any entitiy is not required. We need to evaluate 
from SMF upgrade perspective: accepting admin operation on NG before cluster 
timer expiry should not break any existing campaigns.

Admin operation on Node and Nodegroup does not affect OpenSAF SUs, so check  in 
avd_new_assgn_susi() can be consolidated by excluding OpenSAF SU as you have 
already pointed out in the description. 

Thanks,
Praveen



---

** [tickets:#2466] AMF: NodeGroup Admin UNLOCK timeout during cluster start up**

**Status:** unassigned
**Milestone:** 5.17.06
**Created:** Tue May 23, 2017 01:19 AM UTC by Minh Hon Chau
**Last Updated:** Tue May 23, 2017 05:15 AM UTC
**Owner:** nobody


When cluster is coming up, if a nodegroup admin op UNLOCK is issued (by SMF in 
this case), the nodegroup admin op can be timed out, because the 
su_cnt_admin_oper of one of PLs remains 1 forever

Sequence in details:
- A cluster has 4 nodes, start cluster
- When 3 nodes (SC1, SC2, PL3) join cluster, admin unlock nodegroup issue
~~~
May 22 14:33:46.665539 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-1' joined 
the cluster
May 22 14:33:48.115919 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'SC-2' joined 
the cluster
May 22 14:34:00.442633 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0526] NO Node 'PL-4' joined 
the cluster
~~~

  NoRed Opensaf SU of PL4 get assigned

~~~
May 22 14:34:00.637324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2040f, act:2, 
'safSu=19781416d5,safSg=NoRed,safApp=OpenSAF', 'safSi=NoRed3,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

   admin unlock nodegroup issues

~~~
 May 22 14:34:02.989761 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/nodegroup.cc:1100] >> ng_admin_op_cb: 
'safAmfNodeGroup=smfLockAdmNg2,safAmfCluster=myAmfCluster', inv:'115964117001', 
op:'1'
 ~~~
 
- When NoRed Opensaf SU of PL-3 becomes ENABLED, it starts assignment

~~~
 May 22 14:34:10.096324 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0725] >> 
avd_su_oper_state_evh: id:29, node:2030f, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' state:1
 May 22 14:34:10.097537 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0305] >> su_insvc: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 0
 May 22 14:34:10.097549 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0111] >> avd_new_assgn_susi: 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' 'safSi=a6b0d555f4,safApp=OpenSAF' 
state=1
May 22 14:34:10.097552 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/siass.cc:0440] >> avd_susi_create: 
safSu=PL-3,safSg=NoRed,safApp=OpenSAF safSi=a6b0d555f4,safApp=OpenSAF state=1
~~~

 The su_cnt_admin_oper of NoRed Opensaf SU is increased.
 
~~~
May 22 14:34:10.098839 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/util.cc:0978] << avd_snd_susi_msg 
May 22 14:34:10.098841 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0268] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:1
~~~

- When NoRed Opensaf SU get assigned

~~~
May 22 14:34:10.105283 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1171] >> 
avd_su_si_assign_evh: id:30, node:2030f, act:2, 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF', 'safSi=a6b0d555f4,safApp=OpenSAF', 
ha:1, err:1, single:0
~~~

  but this su_cnt_admin_oper is not decreased

~~~
May 22 14:34:10.108143 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_nored_fsm.cc:0000] << susi_success
May 22 14:34:10.108148 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2010f203defc2 node not ready for assignments
May 22 14:34:10.108153 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1579] TR Node_state: 2 adest: 
2020fc2b319b5 node not ready for assignments
May 22 14:34:10.108157 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/ndfsm.cc:0621] >> 
avd_nd_ncs_su_assigned 
May 22 14:34:10.108162 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/node.cc:0461] >> avd_node_state_set: 
'safAmfNode=PL-3,safAmfCluster=myAmfCluster' NCS_INIT => PRESENT
~~~

  At the end, su_cnt_admin_oper still remains 1.
  
  The application SU get assigned, the counter's always decreased
~~~
May 22 14:34:10.444624 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sg_2n_fsm.cc:2648] << susi_success: rc:1
May 22 14:34:10.444629 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:1681] TR 
node:'safAmfNode=PL-3,safAmfCluster=myAmfCluster', su_cnt_admin_oper:2
May 22 14:34:10.444632 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0358] >> 
process_su_si_response_for_ng: 
'safSu=PL-3,safSg=2N,safApp=ERIC-sv.SVScsvStreamer'
May 22 14:34:10.444640 osafamfd 
[11068:11068:../../opensaf/src/amf/amfd/sgproc.cc:0457] << 
process_su_si_response_for_ng 
~~~
There is a check in avd_su_si_assign_evh(), that seems not to count Opensaf SU 
when decreased counter
...
      /* else admin oper still not complete */
    } else if ((su->sg_of_su->sg_ncs_spec == false) &&
               ((su->su_on_node->admin_ng != nullptr) ||
                (su->sg_of_su->ng_using_saAmfSGAdminState == true))) {
      AVD_AMF_NG *ng = su->su_on_node->admin_ng;
      // Got response from AMFND for assignments decrement su_cnt_admin_oper.
 ...
 
 In avd_new_assgn_susi(), this counter is increased only depends on @admin_ng 
(which means nodegroup issued) and regardless check of Opensaf SU
 ...
     if (avd_snd_susi_msg(cb, su, susi, AVSV_SUSI_ACT_ASGN, false, nullptr) ==
        NCSCC_RC_SUCCESS) {
      AVD_AVND *node = su->su_on_node;
      if ((node->admin_node_pend_cbk.invocation != 0) ||
          ((node->admin_ng != nullptr) &&
           (node->admin_ng->admin_ng_pend_cbk.invocation != 0))) {
        node->su_cnt_admin_oper++;
        TRACE("node:'%s', su_cnt_admin_oper:%u", node->name.c_str(),
              node->su_cnt_admin_oper);
        if (node->admin_ng != nullptr) {
          node->admin_ng->node_oper_list.insert(node->name);
          TRACE("node_oper_list size:%u", node->admin_ng->oper_list_size());
        }
 ...
 
This scenario makes upgrade failed at the step of UNLOCK nodegroup


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to