Analyses:
1)AMF issues node lock
Dec 24 18:03:58.850448 osafamfd [761:node.cc:1052] >> node_admin_op_cb:
622770257921, 'safAmfNode=PL-4,safAmfCluster=myAmfCluster', 2
Dec 24 18:03:58.850487 osafamfd [761:node.cc:0783] >>
avd_node_admin_lock_unlock_shutdown: safAmfNode=PL-4,safAmfCluster=myAmfCluster
Dec 24 18:03:58.850513 osafamfd [761:node.cc:0674] >> node_admin_state_set:
safAmfNode=PL-4,safAmfCluster=myAmfCluster AdmState UNLOCKED => LOCKED
Dec 24 18:03:58.858531 osafamfd [761:sg_2n_fsm.cc:4121] TR act_found'0',
quisced_found'0', quiscing_found'0'
Dec 24 18:03:58.858537 osafamfd [761:sg_2n_fsm.cc:4138] <<
avd_su_state_determine: state '2'
Dec 24 18:03:58.858542 osafamfd [761:sgproc.cc:1961] >> avd_sg_su_si_del_snd:
'safSu=SU2,safSg=SGONE,safApp=TWONAPP'
Dec 24 18:03:5
After this faults occured on SU2 to SU5 for standby assignments and each of the
SUs moved to disabled state.
AMF did not repair any of these SUs since auto repair is disabled. When stanby
assignments were shifting from SU2 to SU5,
unlock operation on PL-4 was returning with TRY_AGIAN. Node unlock operation
became successful when all the SUs exhausted
for taking standby assingments and went into disabled state.
3)After this admin repair of SU3 was successful and was assigned standby
assignments.So SG beacame stable
Dec 24 18:05:57.029393 osafamfd [761:sg_2n_fsm.cc:4138] <<
avd_su_state_determine: state '1'
Dec 24 18:05:57.029400 osafamfd [761:sg_2n_fsm.cc:0523] << avd_sg_2n_act_susi:
act: 'safSu=SU1,safSg=SGONE,safApp=TWONAPP', stdby:
'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
Dec 24 18:05:57.029576 osafamfd [761:sg_2n_fsm.cc:0682] <<
avd_sg_2n_su_chose_asgn: '(null)'
Dec 24 18:05:57.029584 osafamfd [761:sg_2n_fsm.cc:1824] TR sg_fsm_state 1 => 0
Dec 24 18:05:57.029590 osafamfd [761:mbcsv_api.c:0773] >>
mbcsv_process_snd_ckpt_request: Sendin
4) Since node->admin_node_pend_cbk.admin_oper was not cleared and SU2 is
hosted on this
node, rapir of SU2 is returning with TRY_AGAIN
Dec 24 18:06:03.168937 osafamfd [761:lga_api.c:0903] << saLogWriteLogAsync
Dec 24 18:06:03.168944 osafamfd [761:su.cc:0875] >> su_admin_op_cb:
858993459201, 'safSu=SU2,safSg=SGONE,safApp=TWONAPP', 9
Dec 24 18:06:03.168957 osafamfd [761:imm.cc:1773] >> report_admin_op_error:
inv:858993459201, res:6, Error String:
'Node'safAmfNode=PL-4,safAmfCluster=myAmfCluster' hosting
SU'safSu=SU2,safSg=SGONE,safApp=TWONAPP', undergoing admin operation'1''
Dec 24 18:06:03.168964 osafamfd [761:lga_api.c:0738] >> saLogWriteLogAsync
corresponding check in su.cc:
m_AVD_GET_SU_NODE_PTR(cb, su, node);
if (node->admin_node_pend_cbk.admin_oper != 0) {
report_admin_op_error(immoi_handle, invocation,
SA_AIS_ERR_TRY_AGAIN, NULL,
"Node'%s' hosting SU'%s', undergoing admin
operation'%u'", node->name.value,
su->name.value,
node->admin_node_pend_cbk.admin_oper);
goto done;
}
This same problem is reported in #663 :
Dec 17 16:26:03.388987 osafamfd [7623:node.cc:1052] >> node_admin_op_cb:
704374636604, 'safAmfNode=PL-3,safAmfCluster=myAmfCluster', 1
Dec 17 16:26:03.389016 osafamfd [7623:imm.cc:1773] >> report_admin_op_error:
inv:704374636604, res:6, Error String: 'Node undergoing admin operation'
if (node->admin_node_pend_cbk.admin_oper != 0) {
/* Donot pass node->admin_node_pend_cbk here as previous
counters will get reset in
report_admin_op_error. */
report_admin_op_error(immOiHandle, invocation,
SA_AIS_ERR_TRY_AGAIN, NULL,
"Node undergoing admin operation");
goto done;
}
In both the cases since node->admin_node_pend_cbk.admin_oper is not reset,
further operation on the Node or SU hosted on the
node were not successful.Since both tickets(#663 and #693) represents same
problem marking #693 as duplicate of #663.
---
** [tickets:#693] su_admin_repaired times out after faults**
**Status:** duplicate
**Created:** Tue Dec 24, 2013 12:39 PM UTC by surender khetavath
**Last Updated:** Thu Dec 26, 2013 10:48 AM UTC
**Owner:** Praveen
changeset : 4733
model : 2n
configuration : 1App,1SG,5SUs with 3comps each, 5SIs with 3CSIs each
si-si deps configured as SI1 sponsor for SI2,3,4 resp
SU1 mapped to PL-3,SU2 to PL-4
saAmfSGAutoRepair=0(False)
SuFailover=1(True)
test:
Node lock of PL-4
continuous faults in components getting standby cbk
Admin Repair of SU2 times out
amf-adm repaired safSu=SU2,safSg=SGONE,safApp=TWONAPP
error - command timed out (alarm)
safSu=SU3,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU2,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU1,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=SU4,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU5,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSi=TWONSI1,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI2,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI3,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI4,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI5,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets