- **Milestone**: 4.7.RC1 --> never
---
** [tickets:#1514] Opensaf on payload failed to come up and IMMD on active
controller faulted**
**Status:** duplicate
**Milestone:** never
**Created:** Mon Oct 05, 2015 10:03 AM UTC by Ritu Raj
**Last Updated:** Fri Oct 16, 2015 07:21 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**
-
[1513.tgz](https://sourceforge.net/p/opensaf/tickets/1514/attachment/1513.tgz)
(7.1 MB; application/x-compressed-tar)
Setup:
Changeset- 6901
4 nodes configured with single PBE and a load of 30K objects
Issue observed
* Payload failed to join the cluster and later active controller rebooted
Steps performed:
* Started OpenSAF on the controller SC-1 and SC-1 took the active role .
Oct 5 12:33:31 SLES-64BIT-SLOT1 osafrded[3129]: NO No peer available =>
Setting Active role for this node
Later, started opensaf on slot-2, for which opensafd failed because of the
disk size full. Resolved the issue and restarted the opensaf on slot-2, which
ensured that both the nodes joined the cluster.
Oct 5 12:45:34 SLES-32BIT-SLOT2 osafrded[15186]: NO Peer rde@2010f has active
state => Assigning Standby role to this node
* After controllers formed the cluster, later started opensaf on the remaining
two payloads at same time.
* PL-3 joined the cluster successfully.
*
Oct 5 13:03:19 SLES-64BIT-SLOT3 kernel: [495958.582544] TIPC: Own node address
<1.1.3>, network identity 5234
....
Oct 5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO NODE STATE->
IMM_NODE_FULLY_AVAILABLE 17601
Oct 5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Epoch set to 125 in
ImmModel
Oct 5 13:09:35 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Implementer (applier)
connected: 27 (@OpenSafImmReplicatorB) <0, 2010f>
* PL-4 failed to join the cluster,
Oct 5 13:03:38 SLES-32BIT-SLOT4 kernel: [436326.659526] TIPC: Own node address
<1.1.4>, network identity 5234
Oct 5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO Persistent Back-End
capability configured, Pbe file:imm.db (suffix may get added)
Oct 5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Oct 5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me -
problems with MDS ? 5
Oct 5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me -
problems with MDS ? 5
...
Oct 5 13:04:28 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me -
problems with MDS ? 50
Oct 5 13:04:29 SLES-32BIT-SLOT4 osafimmnd[8781]: ER Failed to load/sync.
Giving up after 51 seconds, restarting..
Oct 5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed DESC:IMMND
Oct 5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Going for recovery
...Oct 5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed to load/sync.
Giving up after 51 seconds, restarting..
Oct 5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Could Not RESPAWN IMMND
Oct 5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed DESC:IMMND
Oct 5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER FAILED TO RESPAWN
Oct 5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER IMMND - Periodic server
job failed
Oct 5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed, exiting...
Oct 5 13:06:41 SLES-32BIT-SLOT4 kernel: [436509.187946] TIPC: Disabling bearer
<eth:eth0>
* After the opensafd failed to come up on PL-4, SC-1 rebooted with IMMD
exiting.
Oct 5 13:08:52 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting
PBE_PRTO_PURGE_MUTATIONS, epoch:123
Oct 5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO ImmModel::getPbeOi reports
missing PbeOi locally => unsafe
Oct 5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting
PBE_PRTO_PURGE_MUTATIONS, epoch:123
Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO SU failover probation
timer started (timeout: 1200000000000 ns)
Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO Performing failover of
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated
from 'componentFailover' to 'suFailover'
Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: ER
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:healthCheckcallbackTimeout Recovery is:suFailover
* PL-4 joined the cluster, after opensafd is started on PL-4 after some time.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets