---

** [tickets:#1729] Immd crashed on Active controller because of health check 
timeout **

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Wed Apr 06, 2016 09:01 AM UTC by Ritu Raj
**Last Updated:** Wed Apr 06, 2016 09:01 AM UTC
**Owner:** nobody


Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE and a load of 30K objects

Issue Observed:
1) Standby controller did not join the active controller.
2) IMMD on active controler got health check timeout.

Steps performed:
* Started OpenSAF on the controller SC-1 with  PBE load and SC-1 took the 
active role .

*  Now, started OpenSAF on  the controller SC-2 and SC-2 failed to join the 
cluster

Apr  6 12:54:04 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF Services(5.0.FC - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: 
[95783.514531] TIPC: Activated (version 2.0.0)
Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514587] NET: Registered 
protocol family
..........
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: Started
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
.........
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr  6 12:54:09 SLES-32BIT-SLOT2 osafimmnd[28303]: WA Resending introduce-me - 
problems with MDS ? 5
........

Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Failed   DESC:IMMND
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Going for recovery
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Trying To RESPAWN 
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Sending SIGKILL to IMMND, 
pid=28297
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER IMMND - Periodic server 
job failed
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed, exiting...
Apr  6 12:55:10 SLES-32BIT-SLOT2 osafimmnd[28340]: Started
........
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
....
Apr  6 12:57:07 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF failed

* After the opensafd failed to come up on SC-2, SC-1 rebooted with immd 
healthcheck timeout.

Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
**Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout  Recovery is:suFailover**
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Apr  6 12:57:09 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60

 This issue is random.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to