I agree that amf does nothing for recovery if component decides finalize handle
as in #495.
I have done similar test, where kill amf_demo and component finalize at
signterm. No recovery, assignment is removed, amfnd sends susi_resp to amfd so
the quiesced sequence does not stuck.
In this ticket, which component finalizes while quiescing assignment. I think
it's right that there's no recovery but at least amfnd has to "report"
susi_resp to amfd, which amfd is waiting for. As of issue in this ticket that
the quiescing sequence can't be completed, the state shows there's still active
assignment for su but the su is UNINSTANTIATED/OF_OF_SERVICE. And the command
hanging which it should not, amfnd can do it better (I guess) that:
- send susi_resp to amfd to complete the sequence,
- or just return TRY_AGAIN on saAmfFinalize because the assignement is been
quiescing
I'm trying to make amfnd report susi_resp to see if it works out. Or it just
simply returns TRY_AGAIN for saAmfFinalize. Any suggestion?
---
** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during
component failover**
**Status:** accepted
**Milestone:** 4.6.2
**Labels:** hanging shutdown node shutdown nodegroup
**Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 10, 2015 06:14 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**
-
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd)
(336.3 kB; application/octet-stream)
- [syslog](https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog)
(314.7 kB; application/octet-stream)
The admin command shutdown node (or nodegroup) will hang if component calls
saAmfFinalize during component failover. Trace is attached.
Scenario:
. Issue admin shutdown node
. component rejects quiescing assignment
saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
. component calls saAmfFinalize, finalizing handle
. Due to failure of quiescing assignment, component failover recovery is
started. As result of it, clc cleanup is called.
. The event finalize handle comes before clc cleanup returns ok.
. avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The
quiescing sequence can't be continued because
avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which
is callback list exist. But the fact component is unregistered, all handles are
deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus
the command hang until timeout
Another similiar test is done on amf_demo, which calls saAmfFinalize when
component receives sigterm. The assignment is quiesced then removed
successfully, since amfnd is "aware of " unregistered component during quiesced
assignment sequence.
The quiescing assignment sequence should be aware of unregistered component
this case, in order to avoid hanging shutdown node. Or saAmfFinalize should
return TRY_AGAIN, to be analyzing ...
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets