snippet from IMMD trace at the time of test:
Aug 19 12:00:18.707828 osafimmd [5958:immd_mbcsv.c:0463] T5
************ENC SYNC COUNT 1142
Aug 19 12:00:18.707832 osafimmd [5958:immd_mbcsv.c:0482] T5 ENCODE
IMMD_A2S_MSG_FEVS: send count: 52246 handle: 2005749858575
Aug 19 12:00:18.707836 osafimmd [5958:immd_mbcsv.c:0605] <<
mbcsv_enc_async_update
Aug 19 12:00:18.707840 osafimmd [5958:immd_mbcsv.c:0843] <<
immd_mbcsv_encode_proc
Aug 19 12:00:18.707844 osafimmd [5958:immd_mbcsv.c:0428] T5 IMMD - MBCSv
Callback Success
Aug 19 12:00:18.707848 osafimmd [5958:immd_mbcsv.c:0429] <<
immd_mbcsv_callback
Aug 19 12:00:18.707851 osafimmd [5958:mbcsv_util.c:0438] TR send the
encoded message to any other peer with same s/w version
Aug 19 12:00:18.707855 osafimmd [5958:mbcsv_util.c:0441] TR dispatching
FSM for NCSMBCSV_SEND_ASYNC_UPDATE
Aug 19 12:00:18.707859 osafimmd [5958:mbcsv_act.c:0101] TR ASYNC update
to be sent. role: 1, svc_id: 42, pwe_hdl: 65549
Aug 19 12:00:18.707863 osafimmd [5958:mbcsv_mds.c:0185] >>
mbcsv_mds_send_msg: sending to vdest:d
Aug 19 12:00:18.707868 osafimmd [5958:mbcsv_mds.c:0209] TR send type
MDS_SENDTYPE_REDRSP:
Aug 19 12:00:18.708472 osafimmd [5958:mbcsv_mds.c:0244] <<
mbcsv_mds_send_msg: success
Aug 19 12:00:18.708472 osafimmd [5958:mbcsv_util.c:0492] <<
mbcsv_send_ckpt_data_to_all_peers
Aug 19 12:00:18.708472 osafimmd [5958:mbcsv_api.c:0868] <<
mbcsv_process_snd_ckpt_request: retval: 1
Aug 19 12:00:18.708472 osafimmd [5958:immd_mbcsv.c:0063] <<
immd_mbcsv_sync_update
Aug 19 12:00:18.708472 osafimmd [5958:immd_mds.c:0750] >>
immd_mds_bcast_send
Aug 19 12:00:18.708472 osafimmd [5958:immsv_evt.c:5400] T8 Sending:
IMMND_EVT_D2ND_GLOB_FEVS_REQ_2 to 0
Aug 19 12:05:59.014948 osafimmd [2456:immd_main.c:0111] >> immd_initialize
Aug 19 12:05:59.036253 osafimmd [2456:ncs_main_pub.c:0223] TR
NCS:PROCESS_ID=2456
Aug 19 12:05:59.036310 osafimmd [2456:sysf_def.c:0090] TR INITIALIZING
LEAP ENVIRONMENT
I have shared the IMMD and IMMND traces with Neel. I will try uploading
IMMD traces to the ticket.
Regards,
Sirisha
On Wednesday 19 August 2015 02:40 PM, Anders Bjornerstedt wrote:
>
> Please reproduce withe IMMD trace on.
>
> /AndersBj
>
> From: Sirisha Alla [mailto:[email protected]]
> Sent: den 19 augusti 2015 11:07
> To: [opensaf:tickets]; [email protected]
> Subject: Re: [tickets]
> <http://sourceforge.net/p/opensaf/tickets/_discuss/> [opensaf:tickets]
> Re: #1291 IMM: IMMD healthcheck callback timeout when standby
> controller rebooted in middle of IMMND sync
>
> Yes, I tried this today. The healthcheck timeout happened on IMMD not
> on IMMND.
>
> /Sirisha
>
> On Wednesday 19 August 2015 02:28 PM, Anders Bjornerstedt wrote:
>
> Changeset "6744" is generated today.
> So I assume this means you reproduced this today.
>
> The IMMND main poll handling processes in sequence on each descriptor,
> so it should not be possible
> For traffic on one descriptor to "starve out" a job on another.
>
> /AndersBj
>
> From: Anders Bjornerstedt [mailto:[email protected]]
> Sent: den 19 augusti 2015 10:54
> To: [opensaf:tickets]
> Subject: [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback
> timeout when standby controller rebooted in middle of IMMND sync
>
> Ok but then the question simply becomes why does the healthcheck
> callback not reach the IMMND or why does the IMMND reply
> not reach the AMFND ?
>
> /AndersBj
>
> From: Sirisha Alla [mailto:[email protected]]
> Sent: den 19 augusti 2015 10:50
> To: [opensaf:tickets]
> Subject: [opensaf:tickets] #1291 IMM: IMMD healthcheck callback
> timeout when standby controller rebooted in middle of IMMND sync
>
> This issue is reproduced on changeset 6744. Syslog as follows:
>
> Aug 19 11:54:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO implementer for
> class 'SaSmfSwBundle' is safSmfService => class extent is safe.
> Aug 19 11:54:13 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Assigned
> 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to
> 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
> Aug 19 11:54:13 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.7.M0 - ) services
> successfully started
> Aug 19 11:54:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully
> announced dump at node 2010f. New Epoch:27
> ......
> Aug 19 12:00:12 SLES-64BIT-SLOT1 kernel: [ 4223.945761] TIPC:
> Established link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO New IMMND process
> is on STANDBY Controller at 2020f
> Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Extended intro
> from node 2020f
> Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: WA IMMND on
> controller (not currently coord) requests sync
> Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Node 2020f request
> sync sync-pid:5221 epoch:0
> Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Announce sync,
> epoch:30
> Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO SERVER STATE:
> IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER
> Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE->
> IMM_NODE_R_AVAILABLE
> Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully
> announced sync. New ruling epoch:30
> Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: logtrace: trace enabled
> to file /var/log/opensaf/osafimmnd, mask=0xffffffff
> Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
> Aug 19 12:00:15 SLES-64BIT-SLOT1 osafamfd[6044]: NO Node 'PL-3' left
> the cluster
> Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went
> down. Not sending track callback for agents on that node
> Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went
> down. Not sending track callback for agents on that node
> Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Global discard
> node received for nodeId:2030f pid:16584
> Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer
> disconnected 15 <0, 2030f(down)> (MsgQueueService131855)
> Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876089] TIPC:
> Resetting link <1.1.1:eth0-1.1.3:eth0>, peer not responding
> Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876098] TIPC: Lost
> link <1.1.1:eth0-1.1.3:eth0> on network plane A
> Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.877196] TIPC: Lost
> contact with <1.1.3>
> Aug 19 12:00:46 SLES-64BIT-SLOT1 kernel: [ 4257.206593] TIPC:
> Established link <1.1.1:eth0-1.1.3:eth0> on network plane A
> Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN
> on saImmOmSearchNext - aborting
> Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: ER SYNC APPARENTLY
> FAILED status:1
> Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO -SERVER STATE:
> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
> Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE->
> IMM_NODE_FULLY_AVAILABLE (2484)
> Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Epoch set to 30
> in ImmModel
> Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord
> broadcasting ABORT_SYNC, epoch:30
> Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 30
> committing with ccbId:100000006/4294967302
> Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964128] TIPC:
> Resetting link <1.1.1:eth0-1.1.3:eth0>, peer not responding
> Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964145] TIPC: Lost
> link <1.1.1:eth0-1.1.3:eth0> on network plane A
> Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964157] TIPC: Lost
> contact with <1.1.3>
> Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: WA PBE process 5994
> appears stuck on runtime data handling - sending SIGTERM
> Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: NO IMM PBE received
> SIG_TERM, closing db handle
> Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: IN IMM PBE process
> EXITING...
> Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer
> locally disconnected. Marking it as doomed 11 <316, 2010f> (OpenSafImmPBE)
> Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: WA Persistent
> back-end process has apparently died.
> Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord
> broadcasting PBE_PRTO_PURGE_MUTATIONS, epoch:30
> Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO
> ImmModel::getPbeOi reports missing PbeOi locally => unsafe
> Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord
> broadcasting PBE_PRTO_PURGE_MUTATIONS, epoch:30
> Aug 19 12:04:30 SLES-64BIT-SLOT1 osafimmnd[5969]: NO
> ImmModel::getPbeOi reports missing PbeOi locally => unsafe
> .....
> Aug 19 12:05:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO
> ImmModel::getPbeOi reports missing PbeOi locally => unsafe
> Aug 19 12:05:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord
> broadcasting PBE_PRTO_PURGE_MUTATIONS, epoch:30
> Aug 19 12:05:14 SLES-64BIT-SLOT1 osafamfnd[6054]: NO SU failover
> probation timer started (timeout: 1200000000000 ns)
> Aug 19 12:05:14 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Performing
> failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
> Aug 19 12:05:14 SLES-64BIT-SLOT1 osafamfnd[6054]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action
> escalated from 'componentFailover' to 'suFailover'
> Aug 19 12:05:14 SLES-64BIT-SLOT1 osafamfnd[6054]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
> 'healthCheckcallbackTimeout' : Recovery is 'suFailover'
> Aug 19 12:05:14 SLES-64BIT-SLOT1 osafamfnd[6054]: ER
> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
> to:healthCheckcallbackTimeout Recovery is:suFailover
> Aug 19 12:05:14 SLES-64BIT-SLOT1 osafamfnd[6054]: Rebooting OpenSAF
> NodeId = 131343 EE Name = , Reason: Component faulted: recovery is
> node failfast, OwnNodeId = 131343, SupervisionTime = 60
>
> In the above logs, is this the reason for IMMND hanging for 3 minutes?
>
> Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: WA PBE process 5994
> appears stuck on runtime data handling - sending SIGTERM
> Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: NO IMM PBE received
> SIG_TERM, closing db handle
> Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: IN IMM PBE process
> EXITING...
>
> ------------------------------------------------------------------------
>
> [tickets:#1291]
> <http://sourceforge.net/p/opensaf/tickets/1291/>http://sourceforge.net/p/opensaf/tickets/1291/http://sourceforge.net/p/opensaf/tickets/1291/http://sourceforge.net/p/opensaf/tickets/1291/
>
> IMM: IMMD healthcheck callback timeout when standby controller
> rebooted in middle of IMMND sync
>
> Status: not-reproducible
> Milestone: never
> Created: Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
> Last Updated: Wed Aug 19, 2015 08:40 AM UTC
> Owner: Neelakanta Reddy
> Attachments:
>
> *
> immlogs.tar.bz2https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2
> (6.8 MB; application/x-bzip)
>
> The issue is observed with 4.6 FC changeset 6377. The system is up and
> running with single pbe and 50k objects. This issue is seen after
> http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM
> application is running on standby controller and immcfg command is run
> from payload to set CompRestartMax value to 1000. IMMND is killed
> twice on standby controller leading to #1290.
>
> As a result, standby controller left the cluster in middle of sync,
> IMMD reported healthcheck callback timeout and the active controller
> too went for reboot. Following is the syslog of SC-1:
>
> Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for
> node id 2020f:
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF
> NodeId = 131599 EE Name = , Reason: Received Node Down for peer
> controller, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC:
> Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost
> link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost
> contact with <1.1.2>
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left
> the cluster
> Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node
> in the absence of PLM is outside the scope of OpenSAF
> Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC:
> Established link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics;
> dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0',
> processed='center(queued)=2197', processed='center(received)=1172',
> processed='destination(messages)=1172',
> processed='destination(mailinfo)=0',
> processed='destination(mailwarn)=0',
> processed='destination(localmessages)=955',
> processed='destination(newserr)=0',
> processed='destination(mailerr)=0', processed='destination(netmgm)=0',
> processed='destination(warn)=44', processed='destination(console)=13',
> processed='destination(null)=0', processed='destination(mail)=0',
> processed='destination(xconsole)=13',
> processed='destination(firewall)=0', processed='destination(acpid)=0',
> processed='destination(newscrit)=0',
> processed='destination(newsnotice)=0', processed='source(src)=1172'
> Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN
> on saImmOmSearchNext - aborting
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY
> FAILED status:1
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE:
> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE->
> IMM_NODE_FULLY_AVAILABLE (2484)
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12
> in ImmModel
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord
> broadcasting ABORT_SYNC, epoch:12
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12
> committing with ccbId:100000054/4294967380
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover
> probation timer started (timeout: 1200000000000 ns)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing
> failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action
> escalated from 'componentFailover' to 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
> 'healthCheckcallbackTimeout' : Recovery is 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: ER
> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
> to:healthCheckcallbackTimeout Recovery is:suFailover
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: Rebooting OpenSAF
> NodeId = 131343 EE Name = , Reason: Component faulted: recovery is
> node failfast, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 15:01:34 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
> timeout=60
>
> syslog, immnd and immd traces of SC-1 attached.
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because you indicated interest in
> https://sourceforge.net/p/opensaf/tickets/1291/
>
> To unsubscribe from further messages, please visit
> https://sourceforge.net/auth/subscriptions/
>
> ------------------------------------------------------------------------
>
> [tickets:#1291]
> <http://sourceforge.net/p/opensaf/tickets/1291/>http://sourceforge.net/p/opensaf/tickets/1291/http://sourceforge.net/p/opensaf/tickets/1291/
>
> IMM: IMMD healthcheck callback timeout when standby controller
> rebooted in middle of IMMND sync
>
> Status: not-reproducible
> Milestone: never
> Created: Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
> Last Updated: Wed Aug 19, 2015 08:49 AM UTC
> Owner: Neelakanta Reddy
> Attachments:
>
> *
> immlogs.tar.bz2http://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2
> (6.8 MB; application/x-bzip)
>
> The issue is observed with 4.6 FC changeset 6377. The system is up and
> running with single pbe and 50k objects. This issue is seen after
> http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM
> application is running on standby controller and immcfg command is run
> from payload to set CompRestartMax value to 1000. IMMND is killed
> twice on standby controller leading to #1290.
>
> As a result, standby controller left the cluster in middle of sync,
> IMMD reported healthcheck callback timeout and the active controller
> too went for reboot. Following is the syslog of SC-1:
>
> Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for
> node id 2020f:
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF
> NodeId = 131599 EE Name = , Reason: Received Node Down for peer
> controller, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC:
> Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost
> link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost
> contact with <1.1.2>
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left
> the cluster
> Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node
> in the absence of PLM is outside the scope of OpenSAF
> Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC:
> Established link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics;
> dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0',
> processed='center(queued)=2197', processed='center(received)=1172',
> processed='destination(messages)=1172',
> processed='destination(mailinfo)=0',
> processed='destination(mailwarn)=0',
> processed='destination(localmessages)=955',
> processed='destination(newserr)=0',
> processed='destination(mailerr)=0', processed='destination(netmgm)=0',
> processed='destination(warn)=44', processed='destination(console)=13',
> processed='destination(null)=0', processed='destination(mail)=0',
> processed='destination(xconsole)=13',
> processed='destination(firewall)=0', processed='destination(acpid)=0',
> processed='destination(newscrit)=0',
> processed='destination(newsnotice)=0', processed='source(src)=1172'
> Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN
> on saImmOmSearchNext - aborting
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY
> FAILED status:1
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE:
> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE->
> IMM_NODE_FULLY_AVAILABLE (2484)
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12
> in ImmModel
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord
> broadcasting ABORT_SYNC, epoch:12
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12
> committing with ccbId:100000054/4294967380
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover
> probation timer started (timeout: 1200000000000 ns)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing
> failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action
> escalated from 'componentFailover' to 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
> 'healthCheckcallbackTimeout' : Recovery is 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: ER
> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
> to:healthCheckcallbackTimeout Recovery is:suFailover
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: Rebooting OpenSAF
> NodeId = 131343 EE Name = , Reason: Component faulted: recovery is
> node failfast, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 15:01:34 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
> timeout=60
>
> syslog, immnd and immd traces of SC-1 attached.
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because you indicated interest in
> https://sourceforge.net/p/opensaf/tickets/1291/
>
> To unsubscribe from further messages, please visit
> https://sourceforge.net/auth/subscriptions/
>
> ------------------------------------------------------------------------
>
> [tickets:#1291]
> <http://sourceforge.net/p/opensaf/tickets/1291/>http://sourceforge.net/p/opensaf/tickets/1291/
>
> IMM: IMMD healthcheck callback timeout when standby controller
> rebooted in middle of IMMND sync
>
> Status: not-reproducible
> Milestone: never
> Created: Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
> Last Updated: Wed Aug 19, 2015 08:49 AM UTC
> Owner: Neelakanta Reddy
> Attachments:
>
> *
> immlogs.tar.bz2http://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2
> (6.8 MB; application/x-bzip)
>
> The issue is observed with 4.6 FC changeset 6377. The system is up and
> running with single pbe and 50k objects. This issue is seen after
> http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM
> application is running on standby controller and immcfg command is run
> from payload to set CompRestartMax value to 1000. IMMND is killed
> twice on standby controller leading to #1290.
>
> As a result, standby controller left the cluster in middle of sync,
> IMMD reported healthcheck callback timeout and the active controller
> too went for reboot. Following is the syslog of SC-1:
>
> Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for
> node id 2020f:
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF
> NodeId = 131599 EE Name = , Reason: Received Node Down for peer
> controller, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC:
> Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost
> link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost
> contact with <1.1.2>
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left
> the cluster
> Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node
> in the absence of PLM is outside the scope of OpenSAF
> Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC:
> Established link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics;
> dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0',
> processed='center(queued)=2197', processed='center(received)=1172',
> processed='destination(messages)=1172',
> processed='destination(mailinfo)=0',
> processed='destination(mailwarn)=0',
> processed='destination(localmessages)=955',
> processed='destination(newserr)=0',
> processed='destination(mailerr)=0', processed='destination(netmgm)=0',
> processed='destination(warn)=44', processed='destination(console)=13',
> processed='destination(null)=0', processed='destination(mail)=0',
> processed='destination(xconsole)=13',
> processed='destination(firewall)=0', processed='destination(acpid)=0',
> processed='destination(newscrit)=0',
> processed='destination(newsnotice)=0', processed='source(src)=1172'
> Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN
> on saImmOmSearchNext - aborting
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY
> FAILED status:1
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE:
> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE->
> IMM_NODE_FULLY_AVAILABLE (2484)
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12
> in ImmModel
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord
> broadcasting ABORT_SYNC, epoch:12
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12
> committing with ccbId:100000054/4294967380
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover
> probation timer started (timeout: 1200000000000 ns)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing
> failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action
> escalated from 'componentFailover' to 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
> 'healthCheckcallbackTimeout' : Recovery is 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: ER
> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
> to:healthCheckcallbackTimeout Recovery is:suFailover
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: Rebooting OpenSAF
> NodeId = 131343 EE Name = , Reason: Component faulted: recovery is
> node failfast, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 15:01:34 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
> timeout=60
>
> syslog, immnd and immd traces of SC-1 attached.
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because
> [email protected][email protected]
> <mailto:[email protected]> is subscribed to
> http://sourceforge.net/p/opensaf/tickets/
>
> To unsubscribe from further messages, a project admin can change
> settings at http://sourceforge.net/p/opensaf/admin/tickets/options.
> Or, if this is a mailing list, you can unsubscribe from the mailing list.
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
>
> Opensaf-tickets mailing list
>
> [email protected][email protected]
> <mailto:[email protected]>
>
> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
>
> ------------------------------------------------------------------------
>
> *[tickets:#1291] <http://sourceforge.net/p/opensaf/tickets/1291/> IMM:
> IMMD healthcheck callback timeout when standby controller rebooted in
> middle of IMMND sync*
>
> *Status:* not-reproducible
> *Milestone:* never
> *Created:* Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
> *Last Updated:* Wed Aug 19, 2015 08:49 AM UTC
> *Owner:* Neelakanta Reddy
> *Attachments:*
>
> * immlogs.tar.bz2
> <http://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2>
> (6.8 MB; application/x-bzip)
>
> The issue is observed with 4.6 FC changeset 6377. The system is up and
> running with single pbe and 50k objects. This issue is seen after
> http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM
> application is running on standby controller and immcfg command is run
> from payload to set CompRestartMax value to 1000. IMMND is killed
> twice on standby controller leading to #1290.
>
> As a result, standby controller left the cluster in middle of sync,
> IMMD reported healthcheck callback timeout and the active controller
> too went for reboot. Following is the syslog of SC-1:
>
> Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for
> node id 2020f:
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF
> NodeId = 131599 EE Name = , Reason: Received Node Down for peer
> controller, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC:
> Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost
> link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost
> contact with <1.1.2>
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went
> down. Not sending track callback for agents on that node
> Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left
> the cluster
> Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node
> in the absence of PLM is outside the scope of OpenSAF
> Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC:
> Established link <1.1.1:eth0-1.1.2:eth0> on network plane A
> Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics;
> dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0',
> processed='center(queued)=2197', processed='center(received)=1172',
> processed='destination(messages)=1172',
> processed='destination(mailinfo)=0',
> processed='destination(mailwarn)=0',
> processed='destination(localmessages)=955',
> processed='destination(newserr)=0',
> processed='destination(mailerr)=0', processed='destination(netmgm)=0',
> processed='destination(warn)=44', processed='destination(console)=13',
> processed='destination(null)=0', processed='destination(mail)=0',
> processed='destination(xconsole)=13',
> processed='destination(firewall)=0', processed='destination(acpid)=0',
> processed='destination(newscrit)=0',
> processed='destination(newsnotice)=0', processed='source(src)=1172'
> Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN
> on saImmOmSearchNext - aborting
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY
> FAILED status:1
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE:
> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE->
> IMM_NODE_FULLY_AVAILABLE (2484)
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12
> in ImmModel
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord
> broadcasting ABORT_SYNC, epoch:12
> Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12
> committing with ccbId:100000054/4294967380
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover
> probation timer started (timeout: 1200000000000 ns)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing
> failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action
> escalated from 'componentFailover' to 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
> 'healthCheckcallbackTimeout' : Recovery is 'suFailover'
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: ER
> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
> to:healthCheckcallbackTimeout Recovery is:suFailover
> Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: Rebooting OpenSAF
> NodeId = 131343 EE Name = , Reason: Component faulted: recovery is
> node failfast, OwnNodeId = 131343, SupervisionTime = 60
> Mar 26 15:01:34 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
> timeout=60
>
> syslog, immnd and immd traces of SC-1 attached.
>
> ------------------------------------------------------------------------
>
> Sent from sourceforge.net because
> [email protected] is subscribed to
> http://sourceforge.net/p/opensaf/tickets/
>
> To unsubscribe from further messages, a project admin can change
> settings at http://sourceforge.net/p/opensaf/admin/tickets/options.
> Or, if this is a mailing list, you can unsubscribe from the mailing list.
>
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Opensaf-tickets mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
---
** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby
controller rebooted in middle of IMMND sync**
**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
**Last Updated:** Wed Aug 19, 2015 09:11 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**
-
[immlogs.tar.bz2](http://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2)
(6.8 MB; application/x-bzip)
The issue is observed with 4.6 FC changeset 6377. The system is up and running
with single pbe and 50k objects. This issue is seen after
http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is
running on standby controller and immcfg command is run from payload to set
CompRestartMax value to 1000. IMMND is killed twice on standby controller
leading to #1290.
As a result, standby controller left the cluster in middle of sync, IMMD
reported healthcheck callback timeout and the active controller too went for
reboot. Following is the syslog of SC-1:
Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id
2020f:
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131343, SupervisionTime = 60
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with
<1.1.2>
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not
sending track callback for agents on that node
Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster
Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics;
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0',
processed='center(queued)=2197', processed='center(received)=1172',
processed='destination(messages)=1172', processed='destination(mailinfo)=0',
processed='destination(mailwarn)=0',
processed='destination(localmessages)=955', processed='destination(newserr)=0',
processed='destination(mailerr)=0', processed='destination(netmgm)=0',
processed='destination(warn)=44', processed='destination(console)=13',
processed='destination(null)=0', processed='destination(mail)=0',
processed='destination(xconsole)=13', processed='destination(firewall)=0',
processed='destination(acpid)=0', processed='destination(newscrit)=0',
processed='destination(newsnotice)=0', processed='source(src)=1172'
Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on
saImmOmSearchNext - aborting
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED
status:1
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE:
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE->
IMM_NODE_FULLY_AVAILABLE (2484)
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting
ABORT_SYNC, epoch:12
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing
with ccbId:100000054/4294967380
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation
timer started (timeout: 1200000000000 ns)
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing failover of
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated
from 'componentFailover' to 'suFailover'
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: ER
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:healthCheckcallbackTimeout Recovery is:suFailover
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131343, SupervisionTime = 60
Mar 26 15:01:34 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
timeout=60
syslog, immnd and immd traces of SC-1 attached.
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets