On 17-Dec-15 11:59 AM, Minh Hon Chau wrote:
> Just back from vacation.
> In my test, the comp-failover was not reported to AMFD because the
> escalation timer has expired just before the return of clc cleanup. Also
> I have tried to add more PI components, and only one of them does
> saAmfFinalize, the issue still happens since the assignment sequence
> seems to be discontinued due to the errorneous component.
>

I think, amfnd should perform comp-failover recovery because at the time 
of launching cleanup for the failed component recovery was calculated as 
comp-failover and no additional fault after that led to higher level 
escalation and hence recovery.

In the attached amfnd traces:
Oct 22 15:43:16.572821 osafamfnd [1623:main.cc:0657] TR Evt type:38
Oct 22 15:43:16.572998 osafamfnd [1623:err.cc:1474] TEST: >> 
avnd_evt_tmr_node_err_esc_evh
Oct 22 15:43:16.573136 osafamfnd [1623:err.cc:1476] NO TEST: SU failover 
probation timer expired

In this timer expiry event, amfnd is resetting su->su_err_esc_level 
which is used in the if condition to send comp-failover request. I had 
added this additional check on  su->su_err_esc_level in #315 to 
consolidate checks to define the fault context like comp-failvoer, 
su-failvoer, and surestart. I think instead of using 
su->su_err_esc_level, a new flag can be introduced in su.


Thanks,
Praveen
> ------------------------------------------------------------------------
>
> *[tickets:#1590] <http://sourceforge.net/p/opensaf/tickets/1590/>
> Shutdown node hang if component calls saAmfFinalize during component
> failover*
>
> *Status:* accepted
> *Milestone:* 4.6.2
> *Labels:* hanging shutdown node shutdown nodegroup
> *Created:* Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
> *Last Updated:* Tue Nov 10, 2015 07:41 AM UTC
> *Owner:* Minh Hon Chau
> *Attachments:*
>
>   * osafamfnd
>     <https://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd> 
> (336.3
>     kB; application/octet-stream)
>   * syslog
>     <https://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog>
>     (314.7 kB; application/octet-stream)
>
> The admin command shutdown node (or nodegroup) will hang if component
> calls saAmfFinalize during component failover. Trace is attached.
>
> Scenario:
> . Issue admin shutdown node
> . component rejects quiescing assignment
> saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
> . component calls saAmfFinalize, finalizing handle
> . Due to failure of quiescing assignment, component failover recovery is
> started. As result of it, clc cleanup is called.
> . The event finalize handle comes before clc cleanup returns ok.
> . avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success
> case. The quiescing sequence can't be continued because
> avnd_comp_cmplete_all_assignment() currently seems to handle normal
> case, which is callback list exist. But the fact component is
> unregistered, all handles are deleted by saAmfFinalize. No
> su_si_oper_done is sent to amfd at the end, thus the command hang until
> timeout
>
> Another similiar test is done on amf_demo, which calls saAmfFinalize
> when component receives sigterm. The assignment is quiesced then removed
> successfully, since amfnd is "aware of " unregistered component during
> quiesced assignment sequence.
>
> The quiescing assignment sequence should be aware of unregistered
> component this case, in order to avoid hanging shutdown node. Or
> saAmfFinalize should return TRY_AGAIN, to be analyzing ...
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because you indicated interest in
> https://sourceforge.net/p/opensaf/tickets/1590/
>
> To unsubscribe from further messages, please visit
> https://sourceforge.net/auth/subscriptions/
>


---

** [tickets:#1590] Shutdown node hang if component calls saAmfFinalize during 
component failover**

**Status:** accepted
**Milestone:** 4.6.2
**Labels:** hanging shutdown node shutdown nodegroup 
**Created:** Tue Nov 10, 2015 05:03 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 10, 2015 07:41 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- 
[osafamfnd](http://sourceforge.net/p/opensaf/tickets/1590/attachment/osafamfnd) 
(336.3 kB; application/octet-stream)
- [syslog](http://sourceforge.net/p/opensaf/tickets/1590/attachment/syslog) 
(314.7 kB; application/octet-stream)


The admin command shutdown node (or nodegroup) will hang if component calls 
saAmfFinalize during component failover. Trace is attached.

Scenario:
. Issue admin shutdown node
. component rejects quiescing assignment 
saAmfCSIQuiescingComplete(SA_AIS_ERR_FAILED_OPERATION)
. component calls saAmfFinalize, finalizing handle
. Due to failure of quiescing assignment, component failover recovery is 
started. As result of it, clc cleanup is called.
. The event finalize handle comes before clc cleanup returns ok.
. avnd_comp_clc_terming_cleansucc_hdler() is handling cleanup success case. The 
quiescing sequence can't be continued because 
avnd_comp_cmplete_all_assignment() currently seems to handle normal case, which 
is callback list exist. But the fact component is unregistered, all handles are 
deleted by saAmfFinalize. No su_si_oper_done is sent to amfd at the end, thus 
the command hang until timeout

Another similiar test is done on amf_demo, which calls saAmfFinalize when 
component receives sigterm. The assignment is quiesced then removed 
successfully, since amfnd is "aware of " unregistered component during quiesced 
assignment sequence.

The quiescing assignment sequence should be aware of unregistered component 
this case, in order to avoid hanging shutdown node. Or saAmfFinalize should 
return TRY_AGAIN, to be analyzing ... 







---

Sent from sourceforge.net because [email protected] is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to