Analyses:

1)AMF issues node lock
Dec 24 18:03:58.850448 osafamfd [761:node.cc:1052] >> node_admin_op_cb: 
622770257921, 'safAmfNode=PL-4,safAmfCluster=myAmfCluster', 2
Dec 24 18:03:58.850487 osafamfd [761:node.cc:0783] >> 
avd_node_admin_lock_unlock_shutdown: safAmfNode=PL-4,safAmfCluster=myAmfCluster
Dec 24 18:03:58.850513 osafamfd [761:node.cc:0674] >> node_admin_state_set: 
safAmfNode=PL-4,safAmfCluster=myAmfCluster AdmState UNLOCKED => LOCKED

Dec 24 18:03:58.858531 osafamfd [761:sg_2n_fsm.cc:4121] TR act_found'0', 
quisced_found'0', quiscing_found'0'
Dec 24 18:03:58.858537 osafamfd [761:sg_2n_fsm.cc:4138] << 
avd_su_state_determine: state '2'
Dec 24 18:03:58.858542 osafamfd [761:sgproc.cc:1961] >> avd_sg_su_si_del_snd: 
'safSu=SU2,safSg=SGONE,safApp=TWONAPP'
Dec 24 18:03:5

After this faults occured on SU2 to SU5 for standby assignments and each of the 
SUs moved to disabled state.
AMF did not repair any of these SUs since auto repair is disabled. When stanby 
assignments were shifting from SU2 to SU5,
unlock operation on  PL-4 was returning with TRY_AGIAN. Node unlock operation 
became successful when all the SUs exhausted
for taking standby assingments and went into disabled state.


3)After this admin repair of SU3 was successful and was assigned standby 
assignments.So SG beacame stable

Dec 24 18:05:57.029393 osafamfd [761:sg_2n_fsm.cc:4138] << 
avd_su_state_determine: state '1'
Dec 24 18:05:57.029400 osafamfd [761:sg_2n_fsm.cc:0523] << avd_sg_2n_act_susi: 
act: 'safSu=SU1,safSg=SGONE,safApp=TWONAPP', stdby: 
'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
Dec 24 18:05:57.029576 osafamfd [761:sg_2n_fsm.cc:0682] << 
avd_sg_2n_su_chose_asgn: '(null)'
Dec 24 18:05:57.029584 osafamfd [761:sg_2n_fsm.cc:1824] TR sg_fsm_state 1 => 0
Dec 24 18:05:57.029590 osafamfd [761:mbcsv_api.c:0773] >> 
mbcsv_process_snd_ckpt_request: Sendin

4) Since node->admin_node_pend_cbk.admin_oper  was not cleared and SU2 is 
hosted on this 
node, rapir of SU2 is returning with TRY_AGAIN

Dec 24 18:06:03.168937 osafamfd [761:lga_api.c:0903] << saLogWriteLogAsync
Dec 24 18:06:03.168944 osafamfd [761:su.cc:0875] >> su_admin_op_cb: 
858993459201, 'safSu=SU2,safSg=SGONE,safApp=TWONAPP', 9
Dec 24 18:06:03.168957 osafamfd [761:imm.cc:1773] >> report_admin_op_error: 
inv:858993459201, res:6, Error String: 
'Node'safAmfNode=PL-4,safAmfCluster=myAmfCluster' hosting 
SU'safSu=SU2,safSg=SGONE,safApp=TWONAPP', undergoing admin operation'1''
Dec 24 18:06:03.168964 osafamfd [761:lga_api.c:0738] >> saLogWriteLogAsync

corresponding check in su.cc:

   m_AVD_GET_SU_NODE_PTR(cb, su, node);
        if (node->admin_node_pend_cbk.admin_oper != 0) {
                report_admin_op_error(immoi_handle, invocation, 
SA_AIS_ERR_TRY_AGAIN, NULL,
                                "Node'%s' hosting SU'%s', undergoing admin 
operation'%u'", node->name.value,
                                su->name.value, 
node->admin_node_pend_cbk.admin_oper);
                goto done;
        }


This same problem is reported in #663 :
Dec 17 16:26:03.388987 osafamfd [7623:node.cc:1052] >> node_admin_op_cb: 
704374636604, 'safAmfNode=PL-3,safAmfCluster=myAmfCluster', 1
Dec 17 16:26:03.389016 osafamfd [7623:imm.cc:1773] >> report_admin_op_error: 
inv:704374636604, res:6, Error String: 'Node undergoing admin operation'



        if (node->admin_node_pend_cbk.admin_oper != 0) {
                /* Donot pass node->admin_node_pend_cbk here as previous 
counters will get reset in
                   report_admin_op_error. */
                report_admin_op_error(immOiHandle, invocation, 
SA_AIS_ERR_TRY_AGAIN, NULL,
                                "Node undergoing admin operation");
                goto done;
        }


In both the cases since node->admin_node_pend_cbk.admin_oper is not reset, 
further operation on the Node or SU hosted on the 
node were not successful.Since both tickets(#663 and #693) represents same 
problem marking #693 as duplicate of #663.




---

** [tickets:#693] su_admin_repaired times out after faults**

**Status:** duplicate
**Created:** Tue Dec 24, 2013 12:39 PM UTC by surender khetavath
**Last Updated:** Thu Dec 26, 2013 10:48 AM UTC
**Owner:** Praveen

changeset : 4733
model : 2n
configuration : 1App,1SG,5SUs with 3comps each, 5SIs with 3CSIs each
si-si deps configured as SI1 sponsor for SI2,3,4 resp
SU1 mapped to PL-3,SU2 to PL-4
saAmfSGAutoRepair=0(False)
SuFailover=1(True)

test:
Node lock of PL-4
continuous faults in components getting standby cbk

Admin Repair of SU2 times out
amf-adm repaired safSu=SU2,safSg=SGONE,safApp=TWONAPP
error - command timed out (alarm)

safSu=SU3,safSg=SGONE,safApp=TWONAPP
        saAmfSUAdminState=UNLOCKED(1)
        saAmfSUOperState=DISABLED(2)
        saAmfSUPresenceState=UNINSTANTIATED(1)
        saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU2,safSg=SGONE,safApp=TWONAPP
        saAmfSUAdminState=UNLOCKED(1)
        saAmfSUOperState=DISABLED(2)
        saAmfSUPresenceState=UNINSTANTIATED(1)
        saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU1,safSg=SGONE,safApp=TWONAPP
        saAmfSUAdminState=UNLOCKED(1)
        saAmfSUOperState=ENABLED(1)
        saAmfSUPresenceState=INSTANTIATED(3)
        saAmfSUReadinessState=IN-SERVICE(2)
safSu=SU4,safSg=SGONE,safApp=TWONAPP
        saAmfSUAdminState=UNLOCKED(1)
        saAmfSUOperState=DISABLED(2)
        saAmfSUPresenceState=UNINSTANTIATED(1)
        saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU5,safSg=SGONE,safApp=TWONAPP
        saAmfSUAdminState=UNLOCKED(1)
        saAmfSUOperState=DISABLED(2)
        saAmfSUPresenceState=UNINSTANTIATED(1)
        saAmfSUReadinessState=OUT-OF-SERVICE(1)

safSi=TWONSI1,safApp=TWONAPP
        saAmfSIAdminState=UNLOCKED(1)
        saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI2,safApp=TWONAPP
        saAmfSIAdminState=UNLOCKED(1)
        saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI3,safApp=TWONAPP
        saAmfSIAdminState=UNLOCKED(1)
        saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI4,safApp=TWONAPP
        saAmfSIAdminState=UNLOCKED(1)
        saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI5,safApp=TWONAPP
        saAmfSIAdminState=UNLOCKED(1)
        saAmfSIAssignmentState=FULLY_ASSIGNED(2)




---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to