This problem can be reproduced by amf_demo, using clc script, apply the
amf_demo_c.diff, bring up model, and shutdown PL-4.
immcfg -f /srv/osaftest/tickets/9999/app2_nwayact_1si_5su.xml
amf-adm unlock-in safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU2,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU3,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock-in safSu=SU5,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU2,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU3,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2
amf-adm unlock safSu=SU5,safSg=AmfDemo2,safApp=AmfDemo2
immcfg -a saAmfNodeSuFailOverProb=100000000
safAmfNode=PL-4,safAmfCluster=myAmfCluster
amf-adm shutdown safAmfNode=PL-4,safAmfCluster=myAmfCluster
syslog:
2015-11-10 16:24:38 PL-4 osafamfnd[417]: NO Assigning
'safSi=AmfDemo1,safApp=AmfDemo2' QUIESCING to
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2'
2015-11-10 16:24:38 PL-4 amf_demo[593]: CSI Set - HAState Quiescing for all
assigned CSIs
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO component with QUIESCED/QUIESCING
assignment failed
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO recovery action 'comp restart'
escalated to 'comp failover'
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO SU failover probation timer started
(timeout: 100000000 ns)
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO Performing failover of
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' (SU failover count: 1)
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO
'safComp=AmfDemo,safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' recovery action
escalated from 'componentRestart' to 'componentFailover'
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO
'safComp=AmfDemo,safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' faulted due to
'csiSetcallbackFailed' : Recovery is 'componentFailover'
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' Presence State INSTANTIATED =>
TERMINATING
2015-11-10 16:24:39 PL-4 amf_demo_script: CLC-STOP:
safComp=AmfDemo,safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2
2015-11-10 16:24:39 PL-4 osafamfnd[417]: NO SU failover probation timer expired
2015-11-10 16:24:42 PL-4 osafamfnd[417]: NO
'safSu=SU4,safSg=AmfDemo2,safApp=AmfDemo2' Presence State TERMINATING =>
UNINSTANTIATED
Attachments:
-
[1590.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/1590.tgz)
(1.2 MB; application/x-compressed-tar)
-
[amf_demo_c.diff](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/amf_demo_c.diff)
(646 Bytes; text/x-patch)
-
[amf_demo_script](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/amf_demo_script)
(1.9 kB; application/octet-stream)
-
[app2_nwayact_1si_5su.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/d4bb70d3/41f4/attachment/app2_nwayact_1si_5su.xml)
(12.4 kB; text/xml)
---
** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during
component failover**
**Status:** accepted
**Milestone:** 4.6.2
**Labels:** hanging shutdown node shutdown nodegroup
**Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 10, 2015 05:03 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**
-
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd)
(336.3 kB; application/octet-stream)
- [syslog](https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog)
(314.7 kB; application/octet-stream)
The admin command shutdown node (or nodegroup) will hang if component calls
saAmfFinalize during component failover. Trace is attached.
Scenario:
. Issue admin shutdown node
. component rejects quiescing assignment
saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
. component calls saAmfFinalize, finalizing handle
. Due to failure of quiescing assignment, component failover recovery is
started. As result of it, clc cleanup is called.
. The event finalize handle comes before clc cleanup returns ok.
. avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The
quiescing sequence can't be continued because
avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which
is callback list exist. But the fact component is unregistered, all handles are
deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus
the command hang until timeout
Another similiar test is done on amf_demo, which calls saAmfFinalize when
component receives sigterm. The assignment is quiesced then removed
successfully, since amfnd is "aware of " unregistered component during quiesced
assignment sequence.
The quiescing assignment sequence should be aware of unregistered component
this case, in order to avoid hanging shutdown node. Or saAmfFinalize should
return TRY_AGAIN, to be analyzing ...
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets