On 18-Dec-15 11:12 AM, Minh Hon Chau wrote:
> Agree with you amfnd should perform comp-failover. The timer expires
> since no other fault happens but the previous recovery (comp-failover)
> had been determined and haven't completed.
>
Also we need to note here, by default in imm.xml value for
saAmfNodeSuFailOverProb is kept high around 20 minutes so that timer
will be runnig for long time. And I think any cleanup script will not
take this much time. At the same time high value of
saAmfNodeSuFailOverProb is justified as node should not reboot with
single fault of su level so easily, for which there is option of
directly configuring node-failover as recovery policy.
Thanks,
Praveen
> ------------------------------------------------------------------------
>
> *[tickets:#1590] <http://sourceforge.net/p/opensaf/tickets/1590/>
> Shutdown node hang if component calls saAmfFinalize during component
> failover*
>
> *Status:* accepted
> *Milestone:* 4.6.2
> *Labels:* hanging shutdown node shutdown nodegroup
> *Created:* Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
> *Last Updated:* Tue Nov 10, 2015 07:41 AM UTC
> *Owner:* Minh Hon Chau
> *Attachments:*
>
> * osafamfnd
> <https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd>
> (336.3
> kB; application/octet-stream)
> * syslog
> <https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog>
> (314.7 kB; application/octet-stream)
>
> The admin command shutdown node (or nodegroup) will hang if component
> calls saAmfFinalize during component failover. Trace is attached.
>
> Scenario:
> . Issue admin shutdown node
> . component rejects quiescing assignment
> saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
> . component calls saAmfFinalize, finalizing handle
> . Due to failure of quiescing assignment, component failover recovery is
> started. As result of it, clc cleanup is called.
> . The event finalize handle comes before clc cleanup returns ok.
> . avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success
> case. The quiescing sequence can't be continued because
> avnd_comp_cmplete_all_assignment() currently seems to handle normal
> case, which is callback list exist. But the fact component is
> unregistered, all handles are deleted by saAmfFinalize. No
> su_si_oper_done is sent to amfd at the end, thus the command hang until
> timeout
>
> Another similiar test is done on amf_demo, which calls saAmfFinalize
> when component receives sigterm. The assignment is quiesced then removed
> successfully, since amfnd is "aware of " unregistered component during
> quiesced assignment sequence.
>
> The quiescing assignment sequence should be aware of unregistered
> component this case, in order to avoid hanging shutdown node. Or
> saAmfFinalize should return TRY_AGAIN, to be analyzing ...
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because you indicated interest in
> https://sourceforge.net/p/opensaf/tickets/1590/
>
> To unsubscribe from further messages, please visit
> https://sourceforge.net/auth/subscriptions/
>
---
** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during
component failover**
**Status:** accepted
**Milestone:** 4.6.2
**Labels:** hanging shutdown node shutdown nodegroup
**Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 10, 2015 07:41 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**
-
[osafamfnd](http://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd)
(336.3 kB; application/octet-stream)
- [syslog](http://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog)
(314.7 kB; application/octet-stream)
The admin command shutdown node (or nodegroup) will hang if component calls
saAmfFinalize during component failover. Trace is attached.
Scenario:
. Issue admin shutdown node
. component rejects quiescing assignment
saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
. component calls saAmfFinalize, finalizing handle
. Due to failure of quiescing assignment, component failover recovery is
started. As result of it, clc cleanup is called.
. The event finalize handle comes before clc cleanup returns ok.
. avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The
quiescing sequence can't be continued because
avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which
is callback list exist. But the fact component is unregistered, all handles are
deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus
the command hang until timeout
Another similiar test is done on amf_demo, which calls saAmfFinalize when
component receives sigterm. The assignment is quiesced then removed
successfully, since amfnd is "aware of " unregistered component during quiesced
assignment sequence.
The quiescing assignment sequence should be aware of unregistered component
this case, in order to avoid hanging shutdown node. Or saAmfFinalize should
return TRY_AGAIN, to be analyzing ...
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets