On 10-Nov-15 1:11 PM, Minh Hon Chau wrote:
> I agree that amf does nothing for recovery if component decides finalize
> handle as in #495.
> I have done similar test, where kill amf_demo and component finalize at
> signterm. No recovery, assignment is removed, amfnd sends susi_resp to
> amfd so the quiesced sequence does not stuck.
>
> In this ticket, which component finalizes while quiescing assignment. I
> think it's right that there's no recovery but at least amfnd has to
> "report" susi_resp to amfd, which amfd is waiting for. As of issue in
> this ticket that the quiescing sequence can't be completed, the state
> shows there's still active assignment for su but the su is
> UNINSTANTIATED/OF_OF_SERVICE. And the command hanging which it should
> not, amfnd can do it better (I guess) that:
> - send susi_resp to amfd to complete the sequence,
> - or just return TRY_AGAIN on saAmfFinalize because the assignement is
> been quiescing
>
> I'm trying to make amfnd report susi_resp to see if it works out. Or it
> just simply returns TRY_AGAIN for saAmfFinalize. Any suggestion?
Since cleanup is being done in the context of component-failover
recovery, AMFND must request comp-failover recovery to AMFD after
successful cleanup of failed component.
Why comp-failover was not reported to AMFD in
avnd_comp_clc_terming_cleansucc_hdler()? Perhaps some parameter is being
reset in finalize()?
How many components are present in the configuration, it seems that
there is a single PI comp.
Thanks
Praveen
>
> ------------------------------------------------------------------------
>
> *[tickets:#1590] <http://sourceforge.net/p/opensaf/tickets/1590/>
> Shutdown node hang if component calls saAmfFinalize during component
> failover*
>
> *Status:* accepted
> *Milestone:* 4.6.2
> *Labels:* hanging shutdown node shutdown nodegroup
> *Created:* Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
> *Last Updated:* Tue Nov 10, 2015 06:14 AM UTC
> *Owner:* Minh Hon Chau
> *Attachments:*
>
> * osafamfnd
> <https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd>
> (336.3
> kB; application/octet-stream)
> * syslog
> <https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog>
> (314.7 kB; application/octet-stream)
>
> The admin command shutdown node (or nodegroup) will hang if component
> calls saAmfFinalize during component failover. Trace is attached.
>
> Scenario:
> . Issue admin shutdown node
> . component rejects quiescing assignment
> saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
> . component calls saAmfFinalize, finalizing handle
> . Due to failure of quiescing assignment, component failover recovery is
> started. As result of it, clc cleanup is called.
> . The event finalize handle comes before clc cleanup returns ok.
> . avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success
> case. The quiescing sequence can't be continued because
> avnd_comp_cmplete_all_assignment() currently seems to handle normal
> case, which is callback list exist. But the fact component is
> unregistered, all handles are deleted by saAmfFinalize. No
> su_si_oper_done is sent to amfd at the end, thus the command hang until
> timeout
>
> Another similiar test is done on amf_demo, which calls saAmfFinalize
> when component receives sigterm. The assignment is quiesced then removed
> successfully, since amfnd is "aware of " unregistered component during
> quiesced assignment sequence.
>
> The quiescing assignment sequence should be aware of unregistered
> component this case, in order to avoid hanging shutdown node. Or
> saAmfFinalize should return TRY_AGAIN, to be analyzing ...
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because [email protected]
> is subscribed to https://sourceforge.net/p/opensaf/tickets/
>
> To unsubscribe from further messages, a project admin can change
> settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or,
> if this is a mailing list, you can unsubscribe from the mailing list.
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Opensaf-tickets mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
>
---
** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during
component failover**
**Status:** accepted
**Milestone:** 4.6.2
**Labels:** hanging shutdown node shutdown nodegroup
**Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 10, 2015 07:41 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**
-
[osafamfnd](http://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd)
(336.3 kB; application/octet-stream)
- [syslog](http://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog)
(314.7 kB; application/octet-stream)
The admin command shutdown node (or nodegroup) will hang if component calls
saAmfFinalize during component failover. Trace is attached.
Scenario:
. Issue admin shutdown node
. component rejects quiescing assignment
saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
. component calls saAmfFinalize, finalizing handle
. Due to failure of quiescing assignment, component failover recovery is
started. As result of it, clc cleanup is called.
. The event finalize handle comes before clc cleanup returns ok.
. avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The
quiescing sequence can't be continued because
avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which
is callback list exist. But the fact component is unregistered, all handles are
deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus
the command hang until timeout
Another similiar test is done on amf_demo, which calls saAmfFinalize when
component receives sigterm. The assignment is quiesced then removed
successfully, since amfnd is "aware of " unregistered component during quiesced
assignment sequence.
The quiescing assignment sequence should be aware of unregistered component
this case, in order to avoid hanging shutdown node. Or saAmfFinalize should
return TRY_AGAIN, to be analyzing ...
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets