I was mainly thinking about it possibly being overloaded in
execution/processing.
/AndersBj
surender khetavath wrote:
> The system was not at all overloaded. The memory available is ~8GB
> cat /proc/meminfo
> MemTotal: 7945404 kB
> MemFree: 7614556 kB
> Buffers: 11980 kB
> Cached: 127952 kB
> SwapCached: 0 kB
>
> Also it is physical m/c not a VM.
>
>
> ---
>
> ** [tickets:#724] imm: sync with payload node resulted in controller reboots**
>
> **Status:** accepted
> **Created:** Thu Jan 16, 2014 11:22 AM UTC by surender khetavath
> **Last Updated:** Fri Jan 17, 2014 09:21 AM UTC
> **Owner:** Anders Bjornerstedt
>
> changeset: 4733
>
> setup: 2 controllers
>
> Test:
> Brought up 2 controllers and addes 2Lakh objects(200000). Now started pl-3
> Opensaf start on pl3 was not successful. After some time both controllers
> rebooted but pl-3 didnot go for reboot though the controllers are not
> available.
>
> syslog on sc-1:
> Jan 16 16:26:43 SLES-SLOT4 osafamfnd[4536]: NO
> 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to
> 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
> Jan 16 16:26:43 SLES-SLOT4 osafimmpbed: NO PBE received SIG_TERM, closing db
> handle
> Jan 16 16:26:43 SLES-SLOT4 osafimmd[4454]: WA IMMND coordinator at 2010f
> apparently crashed => electing new coord
> Jan 16 16:26:43 SLES-SLOT4 osafntfimcnd[4496]: ER saImmOiDispatch() Fail
> SA_AIS_ERR_BAD_HANDLE (9)
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: Started
> Jan 16 16:26:43 SLES-SLOT4 osafimmpbed: WA PBE lost contact with parent IMMND
> - Exiting
> Jan 16 16:26:43 SLES-SLOT4 osafimmd[4454]: NO New coord elected, resides at
> 2020f
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: NO Persistent Back-End capability
> configured, Pbe file:imm.db (suffix may get added)
> Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE:
> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
> Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO New IMMND process is on ACTIVE
> Controller at 2010f
> Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO Extended intro from node 2010f
> Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO Fevs count adjusted to 201407
> preLoadPid: 0
> Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE:
> IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
> Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE:
> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
> Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: WA IMMND on controller (not
> currently coord) requests sync
> Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO Node 2010f request sync
> sync-pid:4980 epoch:0
> Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO NODE STATE-> IMM_NODE_ISOLATED
> Jan 16 16:26:53 SLES-SLOT4 osafamfd[4526]: NO Re-initializing with IMM
> Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: WA IMMND coordinator at 2020f
> apparently crashed => electing new coord
> Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: ER Failed to find candidate for
> new IMMND coordinator
> Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: ER Active IMMD has to restart the
> IMMSv. All IMMNDs will restart
> Jan 16 16:27:09 SLES-SLOT4 osafimmd[4454]: ER IMM RELOAD => ensure cluster
> restart by IMMD exit at both SCs, exiting
> Jan 16 16:27:09 SLES-SLOT4 osafimmnd[4980]: ER IMMND forced to restart on
> order from IMMD, exiting
> Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: NO
> 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to
> 'avaDown' : Recovery is 'componentRestart'
> Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
> Recovery is 'nodeFailfast'
> Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: ER
> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown
> Recovery is:nodeFailfast
> Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: Rebooting OpenSAF NodeId = 131343
> EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
> 131343, SupervisionTime = 60
> Jan 16 16:27:09 SLES-SLOT4 opensaf_reboot: Rebooting local node; timeout=60
> Jan 16 16:27:11 SLES-SLOT4 kernel: [ 1435.892099] md: stopping all md devices.
> Read from remote host 172.1.1.4: Connection reset by peer
>
>
> console output on pl-3
> ps -ef| grep saf
> root 16523 1 0 16:22 ? 00:00:00 /bin/sh
> /usr/lib64/opensaf/clc-cli/osaf-transport-monitor
> root 16629 1 0 16:27 ? 00:00:00 /usr/lib64/opensaf/osafimmnd
> --tracemask=0xffffffff
> root 16860 10365 0 16:39 pts/0 00:00:00 grep saf
>
>
>
> ---
>
> Sent from sourceforge.net because you indicated interest in
> <https://sourceforge.net/p/opensaf/tickets/724/>
>
> To unsubscribe from further messages, please visit
> <https://sourceforge.net/auth/subscriptions/>
>
---
** [tickets:#724] imm: sync with payload node resulted in controller reboots**
**Status:** accepted
**Created:** Thu Jan 16, 2014 11:22 AM UTC by surender khetavath
**Last Updated:** Fri Jan 17, 2014 09:44 AM UTC
**Owner:** Anders Bjornerstedt
changeset: 4733
setup: 2 controllers
Test:
Brought up 2 controllers and addes 2Lakh objects(200000). Now started pl-3
Opensaf start on pl3 was not successful. After some time both controllers
rebooted but pl-3 didnot go for reboot though the controllers are not
available.
syslog on sc-1:
Jan 16 16:26:43 SLES-SLOT4 osafamfnd[4536]: NO
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to
'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4464]: WA Admin owner id 8 does not exist
Jan 16 16:26:43 SLES-SLOT4 osafimmpbed: NO PBE received SIG_TERM, closing db
handle
Jan 16 16:26:43 SLES-SLOT4 osafimmd[4454]: WA IMMND coordinator at 2010f
apparently crashed => electing new coord
Jan 16 16:26:43 SLES-SLOT4 osafntfimcnd[4496]: ER saImmOiDispatch() Fail
SA_AIS_ERR_BAD_HANDLE (9)
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: Started
Jan 16 16:26:43 SLES-SLOT4 osafimmpbed: WA PBE lost contact with parent IMMND -
Exiting
Jan 16 16:26:43 SLES-SLOT4 osafimmd[4454]: NO New coord elected, resides at
2020f
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: NO Persistent Back-End capability
configured, Pbe file:imm.db (suffix may get added)
Jan 16 16:26:43 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO New IMMND process is on ACTIVE
Controller at 2010f
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO Extended intro from node 2010f
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO Fevs count adjusted to 201407
preLoadPid: 0
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE:
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO SERVER STATE:
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: WA IMMND on controller (not
currently coord) requests sync
Jan 16 16:26:44 SLES-SLOT4 osafimmd[4454]: NO Node 2010f request sync
sync-pid:4980 epoch:0
Jan 16 16:26:44 SLES-SLOT4 osafimmnd[4980]: NO NODE STATE-> IMM_NODE_ISOLATED
Jan 16 16:26:53 SLES-SLOT4 osafamfd[4526]: NO Re-initializing with IMM
Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: WA IMMND coordinator at 2020f
apparently crashed => electing new coord
Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: ER Failed to find candidate for new
IMMND coordinator
Jan 16 16:27:08 SLES-SLOT4 osafimmd[4454]: ER Active IMMD has to restart the
IMMSv. All IMMNDs will restart
Jan 16 16:27:09 SLES-SLOT4 osafimmd[4454]: ER IMM RELOAD => ensure cluster
restart by IMMD exit at both SCs, exiting
Jan 16 16:27:09 SLES-SLOT4 osafimmnd[4980]: ER IMMND forced to restart on order
from IMMD, exiting
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: NO
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: ER
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Jan 16 16:27:09 SLES-SLOT4 osafamfnd[4536]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
131343, SupervisionTime = 60
Jan 16 16:27:09 SLES-SLOT4 opensaf_reboot: Rebooting local node; timeout=60
Jan 16 16:27:11 SLES-SLOT4 kernel: [ 1435.892099] md: stopping all md devices.
Read from remote host 172.1.1.4: Connection reset by peer
console output on pl-3
ps -ef| grep saf
root 16523 1 0 16:22 ? 00:00:00 /bin/sh
/usr/lib64/opensaf/clc-cli/osaf-transport-monitor
root 16629 1 0 16:27 ? 00:00:00 /usr/lib64/opensaf/osafimmnd
--tracemask=0xffffffff
root 16860 10365 0 16:39 pts/0 00:00:00 grep saf
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets