This problem can be reproduced by amf_demo, using clc script, apply the 
amf_demo_c.diff, bring up model, and shutdown PL-4.

immcfg -f /srv/osaftest/tickets/9999/app2_nwayact_1si_5su.xml
amf-adm unlock-in safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU2,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU3,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU5,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU2,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU3,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU5,safSg=AmfDemo2,safApp=AmfDemo2

immcfg -a saAmfNodeSuFailOverProb=100000000 
safAmfNode=PL-4,safAmfCluster=myAmfCluster
amf-adm shutdown safAmfNode=PL-4,safAmfCluster=myAmfCluster

syslog:

2015-11-10 16:24:38 PL-4 osafamfnd[417]: NO Assigning 
'safSi=AmfDemo1,safApp=AmfDemo2' QUIESCING to 
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2'
2015-11-10 16:24:38 PL-4 amf_demo[593]: CSI Set - HAState Quiescing for all 
assigned CSIs
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO component with QUIESCED/QUIESCING 
assignment failed
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO recovery action 'comp restart' 
escalated to 'comp failover'
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO SU failover probation timer started 
(timeout: 100000000 ns)
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO Performing failover of 
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' (SU failover count: 1)
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO 
'safComp=AmfDemo,safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' recovery action 
escalated from 'componentRestart' to 'componentFailover'
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO 
'safComp=AmfDemo,safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' faulted due to 
'csiSetcallbackFailed' : Recovery is 'componentFailover'
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO 
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' Presence State INSTANTIATED => 
TERMINATING
2015-11-10 16:24:39 PL-4 amf_demo_script: CLC-STOP: 
safComp=AmfDemo,safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO SU failover probation timer expired
2015-11-10 16:24:42 PL-4 osafamfnd[417]: NO 
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' Presence State TERMINATING => 
UNINSTANTIATED




Attachments:

- 
[1590.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/1590.tgz)
 (1.2 MB; application/x-compressed-tar)
- 
[amf_demo_c.diff](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/amf_demo_c.diff)
 (646 Bytes; text/x-patch)
- 
[amf_demo_script](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/amf_demo_script)
 (1.9 kB; application/octet-stream)
- 
[app2_nwayact_1si_5su.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/app2_nwayact_1si_5su.xml)
 (12.4 kB; text/xml)


---

** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during 
component failover**

**Status:** accepted
**Milestone:** 4.6.2
**Labels:** hanging shutdown node shutdown nodegroup 
**Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 10, 2015 05:03 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd)
 (336.3 kB; application/octet-stream)
- [syslog](https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog) 
(314.7 kB; application/octet-stream)


The admin command shutdown node (or nodegroup) will hang if component calls 
saAmfFinalize during component failover. Trace is attached.

Scenario:
. Issue admin shutdown node
. component rejects quiescing assignment 
saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
. component calls saAmfFinalize, finalizing handle
. Due to failure of quiescing assignment, component failover recovery is 
started. As result of it, clc cleanup is called.
. The event finalize handle comes before clc cleanup returns ok.
. avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The 
quiescing sequence can't be continued because 
avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which 
is callback list exist. But the fact component is unregistered, all handles are 
deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus 
the command hang until timeout

Another similiar test is done on amf_demo, which calls saAmfFinalize when 
component receives sigterm. The assignment is quiesced then removed 
successfully, since amfnd is "aware of " unregistered component during quiesced 
assignment sequence.

The quiescing assignment sequence should be aware of unregistered component 
this case, in order to avoid hanging shutdown node. Or saAmfFinalize should 
return TRY_AGAIN, to be analyzing ... 







---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to