---

** [tickets:#725] imm: sync with payload node resulted in controller reboots**

**Status:** unassigned
**Created:** Thu Jan 16, 2014 11:31 AM UTC by surender khetavath
**Last Updated:** Thu Jan 16, 2014 11:31 AM UTC
**Owner:** nobody

changeset: 4733

setup: 2 controllers

Test:
Brought up 2 controllers and addes 2Lakh objects(200000). Now started pl-3
Opensaf start on pl3 was not successful. After some time both controllers 
rebooted but pl-3 didnot go for reboot though the controllers are not 
available. 

syslog on sc-1:
Jan 16 16:26:43 SLES-SLOT4 osafamfnd[4536]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmpbed: NO PBE received SIG_TERM, closing db 
handle
Jan 16 16:26:43 SLES-SLOT4 osafimmd[4454]: WA IMMND coordinator at 2010f 
apparently crashed => electing new coord
Jan 16 16:26:43 SLES-SLOT4 osafntfimcnd[4496]: ER saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: Started
Jan 16 16:26:43 SLES-SLOT4 osafimmpbed: WA PBE lost contact with parent IMMND - 
Exiting
Jan 16 16:26:43 SLES-SLOT4 osafimmd[4454]: NO New coord elected, resides at 
2020f
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO New IMMND process is on ACTIVE 
Controller at 2010f
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO Extended intro from node 2010f
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO Fevs count adjusted to 201407 
preLoadPid: 0
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: WA IMMND on controller (not 
currently coord) requests sync
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO Node 2010f request sync 
sync-pid:4980 epoch:0 
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO NODE STATE-> IMM_NODE_ISOLATED
Jan 16 16:26:53 SLES-SLOT4 osafamfd[4526]: NO Re-initializing with IMM
Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: WA IMMND coordinator at 2020f 
apparently crashed => electing new coord
Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: ER Failed to find candidate for new 
IMMND coordinator
Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: ER Active IMMD has to restart the 
IMMSv. All IMMNDs will restart
Jan 16 16:27:09 SLES-SLOT4 osafimmd[4454]: ER IMM RELOAD  => ensure cluster 
restart by IMMD exit at both SCs, exiting
Jan 16 16:27:09 SLES-SLOT4 osafimmnd[4980]: ER IMMND forced to restart on order 
from IMMD, exiting
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
Jan 16 16:27:09 SLES-SLOT4 opensaf_reboot: Rebooting local node; timeout=60
Jan 16 16:27:11 SLES-SLOT4 kernel: [ 1435.892099] md: stopping all md devices.
Read from remote host 172.1.1.4: Connection reset by peer


console output on pl-3
ps -ef| grep saf
root     16523     1  0 16:22 ?        00:00:00 /bin/sh 
/usr/lib64/opensaf/clc-cli/osaf-transport-monitor
root     16629     1  0 16:27 ?        00:00:00 /usr/lib64/opensaf/osafimmnd 
--tracemask=0xffffffff
root     16860 10365  0 16:39 pts/0    00:00:00 grep saf



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to