Syslogs and traces of IMM of both controllers.
Attachments:
-
[1729.tar.bz2](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/9df5c75c/2faa/attachment/1729.tar.bz2)
(5.5 MB; application/x-bzip)
---
** [tickets:#1729] Immd crashed on Active controller because of health check
timeout **
**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Wed Apr 06, 2016 09:01 AM UTC by Ritu Raj
**Last Updated:** Wed Apr 06, 2016 09:01 AM UTC
**Owner:** nobody
Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE and a load of 30K objects
Issue Observed:
1) Standby controller did not join the active controller.
2) IMMD on active controler got health check timeout.
Steps performed:
* Started OpenSAF on the controller SC-1 with PBE load and SC-1 took the
active role .
* Now, started OpenSAF on the controller SC-2 and SC-2 failed to join the
cluster
Apr 6 12:54:04 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF Services(5.0.FC - )
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr 6 12:54:04 SLES-32BIT-SLOT2 kernel:
[95783.514531] TIPC: Activated (version 2.0.0)
Apr 6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514587] NET: Registered
protocol family
..........
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: Started
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: NO
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
.........
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO IMMD service is UP ...
ScAbsenseAllowed?:0 introduced?:0
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr 6 12:54:09 SLES-32BIT-SLOT2 osafimmnd[28303]: WA Resending introduce-me -
problems with MDS ? 5
........
Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed to load/sync.
Giving up after 51 seconds, restarting..
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Failed DESC:IMMND
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Going for recovery
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Trying To RESPAWN
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Sending SIGKILL to IMMND,
pid=28297
Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER IMMND - Periodic server
job failed
Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed, exiting...
Apr 6 12:55:10 SLES-32BIT-SLOT2 osafimmnd[28340]: Started
........
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
....
Apr 6 12:57:07 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF failed
* After the opensafd failed to come up on SC-2, SC-1 rebooted with immd
healthcheck timeout.
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO Performing failover of
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated
from 'componentFailover' to 'suFailover'
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
**Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: ER
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:healthCheckcallbackTimeout Recovery is:suFailover**
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131343, SupervisionTime = 60
Apr 6 12:57:09 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
timeout=60
This issue is random.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets