2016-11-15 14:31:45 SC-2 osafrded[404]: NO RDE role set to QUIESCED

2016-11-15 14:31:48 SC-2 osafdtmd[383]: NO Lost contact with 'SC-1'


2016-11-15 14:31:48 SC-2 osafrded[404]: NO Requesting ACTIVE role
2016-11-15 14:31:48 SC-2 osafrded[404]: NO RDE role set to Undefined

2016-11-15 14:31:49 SC-2 osafrded[404]: NO RDE role set to QUIESCED
2016-11-15 14:31:49 SC-2 osafrded[404]: NO Giving up election against 0x2010f 
with role Undefined. My role is now QUIESCED

2016-11-15 14:31:49 SC-2 osafdtmd[383]: NO Established contact with 'SC-1'
2016-11-15 14:31:49 SC-2 osafrded[404]: NO Peer up on node 0x2010f


2016-11-15 14:31:54 SC-2 osafdtmd[383]: NO Lost contact with 'SC-1'

2016-11-15 14:31:54 SC-2 osafrded[404]: NO Requesting ACTIVE role
2016-11-15 14:31:54 SC-2 osafrded[404]: NO RDE role set to Undefined


2016-11-15 14:31:56 SC-2 osafrded[404]: NO Running 
'/usr/local/lib/opensaf/opensaf_sc_active' with 0 argument(s)
2016-11-15 14:31:56 SC-2 osafrded[404]: NO Switched to ACTIVE from Undefined

2016-11-15 14:31:57 SC-2 osafimmnd[431]: exiting for shutdown


Once, RDE in QUISCED state detect the active is down then RDE must promote this 
node to active.



---

** [tickets:#2158] AMF: IMMND dies at Opensaf start up phase causes AMFD 
heartbeat timeout**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Wed Nov 02, 2016 05:20 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 15, 2016 04:11 AM UTC
**Owner:** nobody
**Attachments:**

- 
[osafamfnd_sc2](https://sourceforge.net/p/opensaf/tickets/2158/attachment/osafamfnd_sc2)
 (264.2 kB; application/octet-stream)


If IMMND dies at Opensaf startup phase, IMMND is not restarted by AMF. The 
issue has been observed in following situation
- Restart cluster
- During active controller starts up, a critical component is death which cause 
a node failfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: ER 
safComp=ABC,safSu=1,safSg=2N,safApp=ABC Faulted due to:csiSetcallbackTimeout 
Recovery is:nodeFailfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: Rebooting OpenSAF NodeId = 131343 EE Name 
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, 
SupervisionTime = 60
- In the meantime, standby controller is requested to become active
Oct 25 12:51:27 SC-2 tipclog[16221]: Lost link <1.1.2:eth0-1.1.1:eth0> on 
network plane A
Oct 25 12:51:27 SC-2 osafclmna[4336]: NO Starting to promote this node to a 
system controller
Oct 25 12:51:27 SC-2 osafrded[4387]: NO Requesting ACTIVE role
- IMMND is also death a bit later
Oct 25 12:51:29 SC-2 osafimmnd[4536]: ER MESSAGE:44816 OUT OF ORDER my highest 
processed:44814 - exiting
Oct 25 12:51:29 SC-2 osafamfnd[7414]: NO saClmDispatch BAD_HANDLE
- Other services could not initialize other services since IMMND is death
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:39 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:51:39 SC-2 osafclmd[7386]: WA saImmOiImplementerSet returned 9
Oct 25 12:51:39 SC-2 osafntfd[7372]: WA saLogInitialize returns try again, 
retries...
Oct 25 12:51:39 SC-2 osaflogd[7358]: WA saImmOiImplementerSet returned 
SA_AIS_ERR_BAD_HANDLE (9)
Oct 25 12:51:39 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:51:49 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:50 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:50 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:00 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:20 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:52:20 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:20 SC-2 osafimmd[4489]: NO Extended intro from node 2210f

- At the end, AMFD heart beat timeout 
Oct 25 12:53:57 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:01 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:07 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:11 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:15 SC-2 osafamfnd[7414]: ER AMF director heart beat timeout, 
generating core for amfd

In AMFND trace in SC2, AMFND did not receive su_pres from AMFD, therefore AMFND 
could not initiate middleware components (including IMMND), so AMFND was not 
aware of IMMND's death so that AMFND can restart IMMND. The problem here is 
slightly different from #1828, which happened in newly promoted SC (with 
roamingSC feature) where AMFND had IMMND registered.



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to