Amfd-state of SC-1
Attachment: amfd.state-SC1.3874 (25.6 kB; application/octet-stream)
---
** [tickets:#1301] Middleware no redundancy SI is assigned to the controller
and to the payload.**
**Status:** unassigned
**Milestone:** future
**Created:** Thu Apr 02, 2015 02:49 PM UTC by Srikanth R
**Last Updated:** Fri Apr 03, 2015 06:22 AM UTC
**Owner:** nobody
*Setup*
Version : 4.6 FC
Setup is enabled with single PBE and no AMF application configured.
*Issues*
1) IMMD on active controller faulted due to healthCheckcallbackTimeout ( Might
be the issue mentioned in #1291
2) Middleware no redundancy SI is assigned to the controller and to the payload.
*Steps performed *
-> Performed cluster stop of all the nodes by running the command
"/etc/init.d/opensafd stop" on all the nodes from payload to controller
-> Started opensaf on all the nodes SC-1, SC-2, PL-3 and PL-4. Opensafd came
up successfully.
Apr 2 15:34:24 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.6.FC - ) services
successfully starte
Apr 2 15:34:25 SLES-32BIT-SLOT4 opensafd: OpenSAF(4.6.FC - ) services
successfully started
Apr 2 15:34:28 SLES-64BIT-SLOT3 opensafd: OpenSAF(4.6.FC - ) services
successfully started
-> Ran the command "/etc/init.d/opensafd restart" on both the payloads.
Because of this, PL-4 could not join the cluster, although PL-3 joined the
cluster.
* Opensafd start on PL-3 succeeded with the following syslog.
Apr 2 15:34:38 SLES-64BIT-SLOT1 kernel: [ 6867.812494] TIPC: Resetting link
<1.1.1:eth0-1.1.3:eth0>, requested by peer while probing
Apr 2 15:34:38 SLES-64BIT-SLOT1 kernel: [ 6867.812667] TIPC: Established link
<1.1.1:eth0-1.1.3:eth0> on network plane A
* Opensafd start on PL-4 failed with following syslog
Apr 2 15:34:43 SLES-32BIT-SLOT4 kernel: [258083.946896] TIPC: Established link
<1.1.4:eth0-1.1.1:eth0> on network plane A
Apr 2 15:34:43 SLES-32BIT-SLOT4 osafimmnd[6997]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr 2 15:34:48 SLES-32BIT-SLOT4 osafimmnd[6997]: WA Resending introduce-me -
problems with MDS ? 5
Apr 2 15:36:40 SLES-32BIT-SLOT4 osafimmnd[7032]: ER Failed to load/sync.
Giving up after 51 seconds, restarting..
Apr 2 15:36:40 SLES-32BIT-SLOT4 opensafd[6965]: ER Could Not RESPAWN IMMND
Apr 2 15:36:40 SLES-32BIT-SLOT4 opensafd[6965]: ER Failed DESC:IMMND
Apr 2 15:36:40 SLES-32BIT-SLOT4 opensafd[6965]: ER Trying To RESPAWN
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #2
Apr 2 15:36:40 SLES-32BIT-SLOT4 opensafd[6965]: ER Sending SIGKILL to IMMND,
pid=7027
Apr 2 15:36:40 SLES-32BIT-SLOT4 osafimmnd[7032]: ER IMMND - Periodic server
job failed
Apr 2 15:36:40 SLES-32BIT-SLOT4 osafimmnd[7032]: ER Failed, exiting...
Apr 2 15:36:55 SLES-32BIT-SLOT4 osafimmnd[7068]: Started
Apr 2 15:37:46 SLES-32BIT-SLOT4 osafimmnd[7068]: ER Failed, exiting...
Apr 2 15:37:46 SLES-32BIT-SLOT4 opensafd: Starting OpenSAF fail
Corresponding logs on SC-1 :
Apr 2 15:34:44 SLES-64BIT-SLOT1 kernel: [ 6873.737376] TIPC: Lost contact with
<1.1.4>
Apr 2 15:34:44 SLES-64BIT-SLOT1 kernel: [ 6873.737983] TIPC: Established link
<1.1.1:eth0-1.1.4:eth0> on network plane A
Apr 2 15:36:23 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on
saImmOmSearchNext - aborting
Apr 2 15:36:23 SLES-64BIT-SLOT1 osafimmnd[5772]: ER SYNC APPARENTLY FAILED
status:1
Finally SC-1 went for reboot, as IMMD health checkout timedout.
Apr 2 15:41:22 SLES-64BIT-SLOT1 osafimmnd[5772]: NO Coord broadcasting
PBE_PRTO_PURGE_MUTATIONS, epoch:220
Apr 2 15:41:23 SLES-64BIT-SLOT1 osafimmnd[5772]: NO ImmModel::getPbeOi reports
missing PbeOi locally => unsafe
Apr 2 15:41:23 SLES-64BIT-SLOT1 osafimmnd[5772]: NO Coord broadcasting
PBE_PRTO_PURGE_MUTATIONS, epoch:220
Apr 2 15:41:24 SLES-64BIT-SLOT1 osafamfnd[5853]: NO SU failover probation
timer started (timeout: 1200000000000 ns)
Apr 2 15:41:24 SLES-64BIT-SLOT1 osafamfnd[5853]: NO Performing failover of
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Apr 2 15:41:24 SLES-64BIT-SLOT1 osafamfnd[5853]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated
from 'componentFailover' to 'suFailover'
-> Induced a manual failver on SC-2 by killing osaflogd on SC-2.
-> Now SC-1 is the active controller.
-> When opensafd is started on PL-4, middleware si Nored4 is assigned to two
middleware SUs
Below is the opensafd status
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
Also the SUSI objects are created for Nored4 SI for both SC-1 SU and PL-4 SU.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets