Attached 1770.xml is the simple configuration to reproduce the problem without 
configuring Si deps on a single controller node by locking the active su and 
not responding for remove callback on first component.

Analysis: In the reported problem, comp responses with erroor for remove 
callback. AMFD reports error on component with sufailover recovery and launches 
cleanup of components. Since the reponse of the comp has come, AMFND continues 
to complete remove_done process. As a part of remove done, AMFND recursivley 
calls removal of assignment logic for other components. For other components 
remove callback cannot be given as thery are in terminating state, so AMFND 
marks all the CSIs removed. In the last recursive call, AMFND sees all CSIs are 
moved and assignments are pending and it calls Oper_done logic and deleted 
SUSIs. Since sufailover escalation is going on all comps are not terminated, 
AMFND does not respond to AMFD for any pending assignments. After coming out of 
last recursive call after, same conditions of all CSI assignments removed and 
pending assignmetns are met and again oper_done logic is called. Since SUSIs 
were removed this time AMFND asserts as it does not find any SUSI. Sa
 me issue will be applicable in nodefailover recovery and nodeswitchver with 
sufailover recovery also. As a root cause of the problem is making call to 
oper_done logic. This logic must be called when AMFND has to respond to AMFD 
for pending assignments. In these recovery policy, both recovery will be done 
on the basis of escalation request so calling this logic must be avoided


Attachments:

- 
[1770.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/fa42a0c9/0177/attachment/1770.xml)
 (9.6 kB; text/xml)


---

** [tickets:#1770] AMF : amfnd segfaulted during su failover escalation**

**Status:** accepted
**Milestone:** 4.6.2
**Created:** Tue Apr 19, 2016 06:53 AM UTC by Srikanth R
**Last Updated:** Mon Apr 25, 2016 12:24 PM UTC
**Owner:** Praveen


Setup :
5 node cluster with 3 payloads
changeset : 7438 ( opensaf 5.0.FC)
Application : 2N with 5 SUs ( si-si deps enabled & su failover flag enabled)

Issue :

 AMFND hosting the faulty SU segfaulted during su Failover escalation as part 
of SU lock operation
 
 Steps performed :
 
 -> Initially bring up the application and ensure that application is fully 
assigned.
 
 -> Perform one fault operation on the SU hosting the active assignment, such a 
way that the next fault is escalated to su failover.
 
 -> Perform lock operation of SU hosting the active assignment.
 
 -> Do not respond to the CSI removal callback, for which this fault shall be 
escalated to su failover.
 
 -> AMFND seg faulted with the following bt file
 
 signal: 11 pid: 320 uid: 0
/usr/lib64/libopensaf_core.so.0(+0x1fd9d)[0x7f1d79294d9d]
/lib64/libpthread.so.0(+0xf7c0)[0x7f1d782b67c0]
/usr/lib64/opensaf/osafamfnd[0x43b1ff]
/usr/lib64/opensaf/osafamfnd[0x417f89]
/usr/lib64/opensaf/osafamfnd[0x408469]
/usr/lib64/opensaf/osafamfnd[0x42c65a]
/usr/lib64/opensaf/osafamfnd[0x42c4a0]
/usr/lib64/opensaf/osafamfnd[0x42b979]
/lib64/libc.so.6(_ _libc_start_main+0xe6)[0x7f1d77ac1c36]
/usr/lib64/opensaf/osafamfnd[0x405f29]

-> Below is the entry in osafamfnd trace :

Apr 19 11:23:44.684918 osafamfnd [29522:clc.cc:0870] T1 
'safComp=COMP2SU5TWONAPP,safSu=SU5,safSg=SGONE,safApp=TWONAPP':FSM Enter 
presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence 
state:SA_AMF_PRESENCE_TERMINATING(4)
Apr 19 11:23:44.684924 osafamfnd [29522:clc.cc:0889] << avnd_comp_clc_fsm_run: 1
Apr 19 11:23:44.684930 osafamfnd [29522:err.cc:1120] << avnd_err_su_repair: 
retval=1
Apr 19 11:23:44.684936 osafamfnd [29522:susm.cc:0255] >> avnd_su_siq_prc: SU 
'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.684942 osafamfnd [29522:susm.cc:0260] << avnd_su_siq_prc
Apr 19 11:23:44.684947 osafamfnd [29522:susm.cc:1176] << avnd_su_si_oper_done: 1
Apr 19 11:23:44.684953 osafamfnd [29522:comp.cc:1822] << 
avnd_comp_csi_remove_done: 1
Apr 19 11:23:44.684959 osafamfnd [29522:comp.cc:1321] << avnd_comp_csi_remove: 1
Apr 19 11:23:44.685055 osafamfnd [29522:comp.cc:1678] >> 
all_csis_in_removed_state: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP'
Apr 19 11:23:44.685064 osafamfnd [29522:comp.cc:1691] << 
all_csis_in_removed_state: 1
Apr 19 11:23:44.685070 osafamfnd [29522:susm.cc:1021] >> avnd_su_si_oper_done: 
'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685076 osafamfnd [29522:susm.cc:0845] >> 
susi_operation_in_progress: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)'
Apr 19 11:23:44.685082 osafamfnd [29522:susm.cc:0890] << 
susi_operation_in_progress: 1
Apr 19 11:23:44.685096 osafamfnd [29522:err.cc:1586] >> 
is_no_assignment_due_to_escalations
Apr 19 11:23:44.685102 osafamfnd [29522:err.cc:1591] << 
is_no_assignment_due_to_escalations: true
Apr 19 11:24:51.153931 osafamfnd [2500:ncs_main_pub.c:0223] TR
NCS:PROCESS_ID=2500


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to