There is one ticket related to unmonitored middleware process:
"#1857 amf: spare controller didnot get rebooted when amfd is killed on spare
controller with headless feature enabled."
Here there is nobody to restart the AMFD process or reboot the spare controller
when users kills AMFD process.
---
** [tickets:#2158] OSAF: IMMND dies before IMMND becomes AMF component**
**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Wed Nov 02, 2016 05:20 AM UTC by Minh Hon Chau
**Last Updated:** Wed Nov 16, 2016 10:29 PM UTC
**Owner:** nobody
**Attachments:**
-
[osafamfnd_sc2](https://sourceforge.net/p/opensaf/tickets/2158/attachment/osafamfnd_sc2)
(264.2 kB; application/octet-stream)
If IMMND dies at Opensaf startup phase, IMMND is not restarted by AMF. The
issue has been observed in following situation
- Restart cluster
- During active controller starts up, a critical component is death which cause
a node failfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: ER
safComp=ABC,safSu=1,safSg=2N,safApp=ABC Faulted due to:csiSetcallbackTimeout
Recovery is:nodeFailfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: Rebooting OpenSAF NodeId = 131343 EE Name
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343,
SupervisionTime = 60
- In the meantime, standby controller is requested to become active
Oct 25 12:51:27 SC-2 tipclog[16221]: Lost link <1.1.2:eth0-1.1.1:eth0> on
network plane A
Oct 25 12:51:27 SC-2 osafclmna[4336]: NO Starting to promote this node to a
system controller
Oct 25 12:51:27 SC-2 osafrded[4387]: NO Requesting ACTIVE role
- IMMND is also death a bit later
Oct 25 12:51:29 SC-2 osafimmnd[4536]: ER MESSAGE:44816 OUT OF ORDER my highest
processed:44814 - exiting
Oct 25 12:51:29 SC-2 osafamfnd[7414]: NO saClmDispatch BAD_HANDLE
- Other services could not initialize other services since IMMND is death
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:39 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize(
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:51:39 SC-2 osafclmd[7386]: WA saImmOiImplementerSet returned 9
Oct 25 12:51:39 SC-2 osafntfd[7372]: WA saLogInitialize returns try again,
retries...
Oct 25 12:51:39 SC-2 osaflogd[7358]: WA saImmOiImplementerSet returned
SA_AIS_ERR_BAD_HANDLE (9)
Oct 25 12:51:39 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:51:49 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:50 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:50 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:00 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:52:20 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:52:20 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:20 SC-2 osafimmd[4489]: NO Extended intro from node 2210f
- At the end, AMFD heart beat timeout
Oct 25 12:53:57 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize(
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:01 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:07 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize(
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:11 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:15 SC-2 osafamfnd[7414]: ER AMF director heart beat timeout,
generating core for amfd
In AMFND trace in SC2, AMFND did not receive su_pres from AMFD, therefore AMFND
could not initiate middleware components (including IMMND), so AMFND was not
aware of IMMND's death so that AMFND can restart IMMND. The problem here is
slightly different from #1828, which happened in newly promoted SC (with
roamingSC feature) where AMFND had IMMND registered.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets