---
** [tickets:#2393] Immd got crashed on Active as immnd restarted on Active with
cluster having single controller and payload**
**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:58 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 23, 2017 05:58 AM UTC
**Owner:** nobody
**Attachments:**
-
[PL-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/PL-3.tar.bz2)
(558.9 kB; application/x-bzip)
-
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/SC-1.tar.bz2)
(2.5 MB; application/x-bzip)
###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
2 nodes setup(1 controller and 1 payload)
###Summary
Immd got crashed on Active as immnd restarted on Active with cluster having
single controller and payload
###Steps followed & Observed behaviour
1. Bring up cluster wtih 1 controller and 1 payload
2. Kill immnd on active controller
3. Observed, that immd got crashed on Active controller(SC-1) due to which
Payload also got rebooted
** Issue obserbed when there is only one controller **
**Syslog**
SC-1:::
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer
started (timeout: 60000000000 ns)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO Restarting a component of
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
Mar 23 11:06:12 SO-SLOT-1 osafsmfd[2235]: WA DispatchOiCallback:
saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)'
Mar 23 11:06:12 SO-SLOT-1 osafntfimcnd[2181]: NO saImmOiDispatch() Fail
SA_AIS_ERR_BAD_HANDLE (9)
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: WA IMMND coordinator at 2010f
apparently crashed => electing new coord
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new
IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Active IMMD has to restart the
IMMSv. All IMMNDs will restart
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back
end => ensure cluster restart by IMMD exit at both SCs, exiting
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: ER
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: Rebooting OpenSAF NodeId = 131343 EE
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
131343, SupervisionTime = 60
Mar 23 11:06:12 SO-SLOT-1 opensaf_reboot: Rebooting local node; timeout=60
PL-3:::
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2280]: ER IMMND forced to restart on order
from IMMD, exiting
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer
started (timeout: 60000000000 ns)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO Restarting a component of
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: mkfifo already exists:
/var/lib/opensaf/osafimmnd.fifo File exists
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: Started
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: WA AMF director unexpectedly crashed
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: Rebooting OpenSAF NodeId = 131855 EE
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received,
OwnNodeId = 131855, SupervisionTime = 60
Traces:
>From traces Active 'Failed to find candidate for new IMMND coordinator' and
>Active IMMD has to restart the IMMSv
~~~
Mar 23 11:06:12.535325 osafimmd [2138:src/imm/immd/immd_evt.c:2638] T5 Received
IMMND service event
Mar 23 11:06:12.535349 osafimmd [2138:src/imm/immd/immd_evt.c:2741] T5 PROCESS
MDS EVT: NCSMDS_DOWN, my PID:2138
Mar 23 11:06:12.535451 osafimmd [2138:src/imm/immd/immd_evt.c:2748] T5
NCSMDS_DOWN => local IMMND down
Mar 23 11:06:12.535463 osafimmd [2138:src/imm/immd/immd_evt.c:2763] T5 IMMND
DOWN PROCESS detected by IMMD
Mar 23 11:06:12.535475 osafimmd [2138:src/imm/immd/immd_proc.c:0618] >>
immd_process_immnd_down
Mar 23 11:06:12.535483 osafimmd [2138:src/imm/immd/immd_proc.c:0621] T5
immd_process_immnd_down pid:2149 on-active:1 cb->immnd_coord:2010f
Mar 23 11:06:12.535503 osafimmd [2138:src/imm/immd/immd_proc.c:0628] WA IMMND
coordinator at 2010f apparently crashed => electing new coord
Mar 23 11:06:12.535516 osafimmd [2138:src/imm/immd/immd_proc.c:0204] >>
immd_proc_elect_coord
Mar 23 11:06:12.535536 osafimmd [2138:src/imm/immd/immd_proc.c:0320] ER
**Failed to find candidate for new IMMND coordinator** (ScAbsenceAllowed:0
RulingEpoch:2
Mar 23 11:06:12.535542 osafimmd [2138:src/imm/immd/immd_proc.c:0322] <<
immd_proc_elect_coord
Mar 23 11:06:12.535547 osafimmd [2138:src/imm/immd/immd_proc.c:0059] >>
immd_proc_immd_reset
Mar 23 11:06:12.535560 osafimmd [2138:src/imm/immd/immd_proc.c:0062] ER
**Active IMMD has to restart the IMMSv. All IMMNDs will restart**
Mar 23 11:06:12.535567 osafimmd [2138:src/imm/immd/immd_mbcsv.c:0044] >>
immd_mbcsv_sync_update
Mar 23 11:06:12.535574 osafimmd [2138:src/mbc/mbcsv_api.c:0773] >>
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers,
as per the send-type specified
Mar 23 11:06:12.535582 osafimmd [2138:src/mbc/mbcsv_api.c:0803] TR svc_id:42,
pwe_hdl:65549
Mar 23 11:06:12.535587 osafimmd [2138:src/mbc/mbcsv_api.c:0807] T1 No STANDBY
peers found yet
Mar 23 11:06:12.535593 osafimmd [2138:src/mbc/mbcsv_api.c:0868] <<
mbcsv_process_snd_ckpt_request: retval: 1
Mar 23 11:06:12.535598 osafimmd [2138:src/imm/immd/immd_mbcsv.c:0062] <<
immd_mbcsv_sync_update
Mar 23 11:06:12.535604 osafimmd [2138:src/imm/immd/immd_mds.c:0762] >>
immd_mds_bcast_send
Mar 23 11:06:12.535610 osafimmd [2138:src/imm/common/immsv_evt.c:5422] T8
Sending: IMMND_EVT_D2ND_RESET to 0
Mar 23 11:06:12.535868 osafimmd [2138:src/imm/immd/immd_mds.c:0782] <<
immd_mds_bcast_send
Mar 23 11:06:12.535917 osafimmd [2138:src/imm/immd/immd_proc.c:0104] ER IMM
RELOAD with NO persistent back end => ensure cluster restart by IMMD exit at
both SCs, exiting
~~~
Note:
1. Syslog of Active controler and Pyalod attached
2. Immnd and immd traces attached
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets