---

** [tickets:#2237] AMFD: Headless recovery on NWayActive causes director cyclic 
reboot**

**Status:** unassigned
**Milestone:** 5.1.1
**Labels:** headless recovery 
**Created:** Wed Dec 21, 2016 05:00 AM UTC by Minh Hon Chau
**Last Updated:** Wed Dec 21, 2016 05:00 AM UTC
**Owner:** nobody
**Attachments:**

- 
[app2_nwayact_1si_5npisu.xml](https://sourceforge.net/p/opensaf/tickets/2237/attachment/app2_nwayact_1si_5npisu.xml)
 (14.3 kB; text/xml)


**Configuration and steps:**
- Load attached model file, it's Nway Active model, 5 npi SUs hosted on each 
nodes.
- Unlock-in/Unlock all SUs
- Stop SCs
- Restart SCs
**Observations:**
amfd crashes on SC1 (active controller)
> 2016-12-21 15:02:04 SC-1 osafamfnd[490]: ER AMFD has unexpectedly crashed. 
> Rebooting node
> 2016-12-21 15:02:04 SC-1 osafamfnd[490]: Rebooting OpenSAF NodeId = 131343 EE 
> Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
> 131343, SupervisionTime = 60
> 2016-12-21 15:02:04 SC-1 osafimmnd[433]: NO Implementer locally disconnected. 
> Marking it as doomed 18 <25, 2010f> (safAmfService)
> 2016-12-21 15:02:04 SC-1 opensaf_reboot: Rebooting local node; timeout=60
> 2016-12-21 15:02:04 SC-1 osafimmnd[433]: NO Implementer disconnected 18 <25, 
> 2010f> (safAmfService)

Analysis:
When SC-1 comes back from headless, amfd recreates absent assignments for SU1 
(hosted on SC1) and SU2 (hosted on SC2).
> Dec 21 15:01:47.625965 osafamfd [477:siass.cc:0205] >> 
> avd_susi_read_headless_cached_rta 
> Dec 21 15:01:47.626706 osafamfd [477:siass.cc:0287] TR Absent SUSI, 
> ha_state:'1', fsm_state:'3'
> Dec 21 15:01:47.626709 osafamfd [477:siass.cc:0395] >> avd_susi_create: 
> safSu=SU2,safSg=AmfDemo2,safApp=AmfDemo2 safSi=AmfDemo1,safApp=AmfDemo2 
> state=1
> Dec 21 15:01:47.626848 osafamfd [477:siass.cc:0287] TR Absent SUSI, 
> ha_state:'1', fsm_state:'3'
> Dec 21 15:01:47.626850 osafamfd [477:siass.cc:0395] >> avd_susi_create: 
> safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2 safSi=AmfDemo1,safApp=AmfDemo2 
> state=1

amfd then starts to creates assignment of SU1 and SU2, it could not set SU1 and 
SU2 readiness state
> Dec 21 15:01:50.501773 osafamfd [477:sgproc.cc:1768] >> 
> avd_sg_app_su_inst_func: 'safSg=AmfDemo2,safApp=AmfDemo2'
> Dec 21 15:01:50.501849 osafamfd [477:sgproc.cc:1799] TR Calling su_insvc() 
> for 'safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2'
> Dec 21 15:01:50.501852 osafamfd [477:su.cc:0829] >> set_readiness_state: 
> 'safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2' IN_SERVICE
> ...
> Dec 21 15:01:50.501894 osafamfd [477:su.cc:2451] >> any_susi_fsm_in: 
> SU:'safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2', check_fsm:1
> Dec 21 15:01:50.501951 osafamfd [477:su.cc:2456] TR 
> SUSI:'safSu=SU1,safSg=AmfDemo2,safApp=AmfDemo2,safSi=AmfDemo1,safApp=AmfDemo2',
>  fsm:'1'
> Dec 21 15:01:50.501956 osafamfd [477:su.cc:2459] TR Found
> Dec 21 15:01:50.501958 osafamfd [477:su.cc:2462] << any_susi_fsm_in 
> Dec 21 15:01:50.501961 osafamfd [477:su.cc:0839] TR Can not set readiness 
> state, this SU is under absent failover
> Dec 21 15:01:50.501963 osafamfd [477:su.cc:0868] << set_readiness_state 

amfd got stuck in loop of avd_sg_app_su_inst_func() and su_insvc()



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to