Attached 1770.xml is the simple configuration to reproduce the problem without
configuring Si deps on a single controller node by locking the active su and
not responding for remove callback on first component.
Analysis: In the reported problem, comp responses with erroor for remove
callback. AMFD reports error on component with sufailover recovery and launches
cleanup of components. Since the reponse of the comp has come, AMFND continues
to complete remove_done process. As a part of remove done, AMFND recursivley
calls removal of assignment logic for other components. For other components
remove callback cannot be given as thery are in terminating state, so AMFND
marks all the CSIs removed. In the last recursive call, AMFND sees all CSIs are
moved and assignments are pending and it calls Oper_done logic and deleted
SUSIs. Since sufailover escalation is going on all comps are not terminated,
AMFND does not respond to AMFD for any pending assignments. After coming out of
last recursive call after, same conditions of all CSI assignments removed and
pending assignmetns are met and again oper_done logic is called. Since SUSIs
were removed this time AMFND asserts as it does not find any SUSI. Sa
me issue will be applicable in nodefailover recovery and nodeswitchver with
sufailover recovery also. As a root cause of the problem is making call to
oper_done logic. This logic must be called when AMFND has to respond to AMFD
for pending assignments. In these recovery policy, both recovery will be done
on the basis of escalation request so calling this logic must be avoided
Attachments:
-
[1770.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/fa42a0c9/0177/attachment/1770.xml)
(9.6 kB; text/xml)
---
** [tickets:#1770] AMF : amfnd segfaulted during su failover escalation**
**Status:** accepted
**Milestone:** 4.6.2
**Created:** Tue Apr 19, 2016 06:53 AM UTC by Srikanth R
**Last Updated:** Mon Apr 25, 2016 12:24 PM UTC
**Owner:** Praveen
Setup :
5 node cluster with 3 payloads
changeset : 7438 ( opensaf 5.0.FC)
Application : 2N with 5 SUs ( si-si deps enabled & su failover flag enabled)
Issue :
AMFND hosting the faulty SU segfaulted during su Failover escalation as part
of SU lock operation
Steps performed :
-> Initially bring up the application and ensure that application is fully
assigned.
-> Perform one fault operation on the SU hosting the active assignment, such a
way that the next fault is escalated to su failover.
-> Perform lock operation of SU hosting the active assignment.
-> Do not respond to the CSI removal callback, for which this fault shall be
escalated to su failover.
-> AMFND seg faulted with the following bt file
signal: 11 pid: 320 uid: 0
/usr/lib64/libopensaf_core.so.0(+0x1fd9d)[0x7f1d79294d9d]
/lib64/libpthread.so.0(+0xf7c0)[0x7f1d782b67c0]
/usr/lib64/opensaf/osafamfnd[0x43b1ff]
/usr/lib64/opensaf/osafamfnd[0x417f89]
/usr/lib64/opensaf/osafamfnd[0x408469]
/usr/lib64/opensaf/osafamfnd[0x42c65a]
/usr/lib64/opensaf/osafamfnd[0x42c4a0]
/usr/lib64/opensaf/osafamfnd[0x42b979]
/lib64/libc.so.6(_ _libc_start_main+0xe6)[0x7f1d77ac1c36]
/usr/lib64/opensaf/osafamfnd[0x405f29]
-> Below is the entry in osafamfnd trace :
Apr 19 11:23:44.684918 osafamfnd [29522:clc.cc:0870] T1
'safComp=COMP2SU5TWONAPP,safSu=SU5,safSg=SGONE,safApp=TWONAPP':FSM Enter
presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence
state:SA_AMF_PRESENCE_TERMINATING(4)
Apr 19 11:23:44.684924 osafamfnd [29522:clc.cc:0889] << avnd_comp_clc_fsm_run: 1
Apr 19 11:23:44.684930 osafamfnd [29522:err.cc:1120] << avnd_err_su_repair:
retval=1
Apr 19 11:23:44.684936 osafamfnd [29522:susm.cc:0255] >> avnd_su_siq_prc: SU
'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.684942 osafamfnd [29522:susm.cc:0260] << avnd_su_siq_prc
Apr 19 11:23:44.684947 osafamfnd [29522:susm.cc:1176] << avnd_su_si_oper_done: 1
Apr 19 11:23:44.684953 osafamfnd [29522:comp.cc:1822] <<
avnd_comp_csi_remove_done: 1
Apr 19 11:23:44.684959 osafamfnd [29522:comp.cc:1321] << avnd_comp_csi_remove: 1
Apr 19 11:23:44.685055 osafamfnd [29522:comp.cc:1678] >>
all_csis_in_removed_state: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.685064 osafamfnd [29522:comp.cc:1691] <<
all_csis_in_removed_state: 1
Apr 19 11:23:44.685070 osafamfnd [29522:susm.cc:1021] >> avnd_su_si_oper_done:
'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685076 osafamfnd [29522:susm.cc:0845] >>
susi_operation_in_progress: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685082 osafamfnd [29522:susm.cc:0890] <<
susi_operation_in_progress: 1
Apr 19 11:23:44.685096 osafamfnd [29522:err.cc:1586] >>
is_no_assignment_due_to_escalations
Apr 19 11:23:44.685102 osafamfnd [29522:err.cc:1591] <<
is_no_assignment_due_to_escalations: true
Apr 19 11:24:51.153931 osafamfnd [2500:ncs_main_pub.c:0223] TR
NCS:PROCESS_ID=2500
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets