On 17-Dec-15 11:59 AM, Minh Hon Chau wrote:
> Just back from vacation.
> In my test, the comp-failover was not reported to AMFD because the
> escalation timer has expired just before the return of clc cleanup. Also
> I have tried to add more PI components, and only one of them does
> saAmfFinalize, the issue still happens since the assignment sequence
> seems to be discontinued due to the errorneous component.
>
I think, amfnd should perform comp-failover recovery because at the time
of launching cleanup for the failed component recovery was calculated as
comp-failover and no additional fault after that led to higher level
escalation and hence recovery.
In the attached amfnd traces:
Oct 22 15:43:16.572821 osafamfnd [1623:main.cc:0657] TR Evt type:38
Oct 22 15:43:16.572998 osafamfnd [1623:err.cc:1474] TEST: >>
avnd_evt_tmr_node_err_esc_evh
Oct 22 15:43:16.573136 osafamfnd [1623:err.cc:1476] NO TEST: SU failover
probation timer expired
In this timer expiry event, amfnd is resetting su->su_err_esc_level
which is used in the if condition to send comp-failover request. I had
added this additional check on su->su_err_esc_level in #315 to
consolidate checks to define the fault context like comp-failvoer,
su-failvoer, and surestart. I think instead of using
su->su_err_esc_level, a new flag can be introduced in su.
Thanks,
Praveen
> ------------------------------------------------------------------------
>
> *[tickets:#1590] <http://sourceforge.net/p/opensaf/tickets/1590/>
> Shutdown node hang if component calls saAmfFinalize during component
> failover*
>
> *Status:* accepted
> *Milestone:* 4.6.2
> *Labels:* hanging shutdown node shutdown nodegroup
> *Created:* Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
> *Last Updated:* Tue Nov 10, 2015 07:41 AM UTC
> *Owner:* Minh Hon Chau
> *Attachments:*
>
> * osafamfnd
> <https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd>
> (336.3
> kB; application/octet-stream)
> * syslog
> <https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog>
> (314.7 kB; application/octet-stream)
>
> The admin command shutdown node (or nodegroup) will hang if component
> calls saAmfFinalize during component failover. Trace is attached.
>
> Scenario:
> . Issue admin shutdown node
> . component rejects quiescing assignment
> saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
> . component calls saAmfFinalize, finalizing handle
> . Due to failure of quiescing assignment, component failover recovery is
> started. As result of it, clc cleanup is called.
> . The event finalize handle comes before clc cleanup returns ok.
> . avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success
> case. The quiescing sequence can't be continued because
> avnd_comp_cmplete_all_assignment() currently seems to handle normal
> case, which is callback list exist. But the fact component is
> unregistered, all handles are deleted by saAmfFinalize. No
> su_si_oper_done is sent to amfd at the end, thus the command hang until
> timeout
>
> Another similiar test is done on amf_demo, which calls saAmfFinalize
> when component receives sigterm. The assignment is quiesced then removed
> successfully, since amfnd is "aware of " unregistered component during
> quiesced assignment sequence.
>
> The quiescing assignment sequence should be aware of unregistered
> component this case, in order to avoid hanging shutdown node. Or
> saAmfFinalize should return TRY_AGAIN, to be analyzing ...
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because you indicated interest in
> https://sourceforge.net/p/opensaf/tickets/1590/
>
> To unsubscribe from further messages, please visit
> https://sourceforge.net/auth/subscriptions/
>
---
** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during
component failover**
**Status:** accepted
**Milestone:** 4.6.2
**Labels:** hanging shutdown node shutdown nodegroup
**Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 10, 2015 07:41 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**
-
[osafamfnd](http://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd)
(336.3 kB; application/octet-stream)
- [syslog](http://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog)
(314.7 kB; application/octet-stream)
The admin command shutdown node (or nodegroup) will hang if component calls
saAmfFinalize during component failover. Trace is attached.
Scenario:
. Issue admin shutdown node
. component rejects quiescing assignment
saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
. component calls saAmfFinalize, finalizing handle
. Due to failure of quiescing assignment, component failover recovery is
started. As result of it, clc cleanup is called.
. The event finalize handle comes before clc cleanup returns ok.
. avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The
quiescing sequence can't be continued because
avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which
is callback list exist. But the fact component is unregistered, all handles are
deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus
the command hang until timeout
Another similiar test is done on amf_demo, which calls saAmfFinalize when
component receives sigterm. The assignment is quiesced then removed
successfully, since amfnd is "aware of " unregistered component during quiesced
assignment sequence.
The quiescing assignment sequence should be aware of unregistered component
this case, in order to avoid hanging shutdown node. Or saAmfFinalize should
return TRY_AGAIN, to be analyzing ...
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets