- **Milestone**: 4.7.RC1 --> never


---

** [tickets:#1514]  Opensaf on payload failed to come up and IMMD on active 
controller faulted**

**Status:** duplicate
**Milestone:** never
**Created:** Mon Oct 05, 2015 10:03 AM UTC by Ritu Raj
**Last Updated:** Fri Oct 16, 2015 07:21 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[1513.tgz](https://sourceforge.net/p/opensaf/tickets/1514/attachment/1513.tgz) 
(7.1 MB; application/x-compressed-tar)


Setup:
Changeset- 6901
4 nodes configured with single PBE and a load of 30K objects

Issue observed
* Payload failed to join the cluster and  later active controller rebooted 

Steps performed:
* Started OpenSAF on the  controller SC-1 and SC-1 took the active role .

Oct  5 12:33:31 SLES-64BIT-SLOT1 osafrded[3129]: NO No peer available => 
Setting Active role for this node
 Later, started opensaf on slot-2, for which opensafd failed because of the 
disk size full. Resolved the issue and restarted the opensaf on slot-2, which 
ensured that both the nodes joined the cluster.

Oct  5 12:45:34 SLES-32BIT-SLOT2 osafrded[15186]: NO Peer rde@2010f has active 
state => Assigning Standby role to this node


* After controllers formed the cluster, later started opensaf on the remaining 
two payloads  at same time.
*  PL-3 joined the cluster successfully.
*  
Oct  5 13:03:19 SLES-64BIT-SLOT3 kernel: [495958.582544] TIPC: Own node address 
<1.1.3>, network identity 5234
....
Oct  5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 17601
Oct  5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Epoch set to 125 in 
ImmModel
Oct  5 13:09:35 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Implementer (applier) 
connected: 27 (@OpenSafImmReplicatorB) <0, 2010f>


* PL-4  failed to join the cluster,

Oct  5 13:03:38 SLES-32BIT-SLOT4 kernel: [436326.659526] TIPC: Own node address 
<1.1.4>, network identity 5234
Oct  5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Oct  5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Oct  5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - 
problems with MDS ? 5
Oct  5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - 
problems with MDS ? 5
...
Oct  5 13:04:28 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - 
problems with MDS ? 50
Oct  5 13:04:29 SLES-32BIT-SLOT4 osafimmnd[8781]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Oct  5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed   DESC:IMMND
Oct  5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Going for recovery
...Oct  5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Oct  5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Could Not RESPAWN IMMND
Oct  5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed   DESC:IMMND
Oct  5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER FAILED TO RESPAWN
Oct  5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER IMMND - Periodic server 
job failed
Oct  5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed, exiting...
Oct  5 13:06:41 SLES-32BIT-SLOT4 kernel: [436509.187946] TIPC: Disabling bearer 
<eth:eth0>

 * After the opensafd failed to come up on PL-4, SC-1 rebooted with IMMD 
exiting.

Oct  5 13:08:52 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting 
PBE_PRTO_PURGE_MUTATIONS, epoch:123
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO ImmModel::getPbeOi reports 
missing PbeOi locally => unsafe
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting 
PBE_PRTO_PURGE_MUTATIONS, epoch:123
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO SU failover probation 
timer started (timeout: 1200000000000 ns)
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery is:suFailover


* PL-4 joined the cluster, after opensafd is started on PL-4 after some time.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to