- **status**: unassigned --> invalid
- **Comment**:
This is expected behavior when SC absence is not allowed.
Even if there were more payloads, cluster reboot would be initiated due to
absence of IMMNDs on controllers.
SC absence is set to 0 (is not allowed). This info can be seen in message:
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new
IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2
---
** [tickets:#2393] Immd got crashed on Active as immnd restarted on Active with
cluster having single controller and payload**
**Status:** invalid
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:58 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 23, 2017 05:55 PM UTC
**Owner:** nobody
**Attachments:**
-
[PL-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/PL-3.tar.bz2)
(558.9 kB; application/x-bzip)
-
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/SC-1.tar.bz2)
(2.5 MB; application/x-bzip)
###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
2 nodes setup(1 controller and 1 payload)
###Summary
Immd got crashed on Active as immnd restarted on Active with cluster having
single controller and payload
###Steps followed & Observed behaviour
1. Bring up cluster wtih 1 controller and 1 payload
2. Kill immnd on active controller
3. Observed, that immd got crashed on Active controller(SC-1) due to which
Payload also got rebooted
** Issue obserbed when there is only one controller **
**Syslog**
SC-1:::
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer
started (timeout: 60000000000 ns)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO Restarting a component of
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
Mar 23 11:06:12 SO-SLOT-1 osafsmfd[2235]: WA DispatchOiCallback:
saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)'
Mar 23 11:06:12 SO-SLOT-1 osafntfimcnd[2181]: NO saImmOiDispatch() Fail
SA_AIS_ERR_BAD_HANDLE (9)
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: WA IMMND coordinator at 2010f
apparently crashed => electing new coord
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new
IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Active IMMD has to restart the
IMMSv. All IMMNDs will restart
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back
end => ensure cluster restart by IMMD exit at both SCs, exiting
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: ER
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: Rebooting OpenSAF NodeId = 131343 EE
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
131343, SupervisionTime = 60
Mar 23 11:06:12 SO-SLOT-1 opensaf_reboot: Rebooting local node; timeout=60
PL-3:::
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2280]: ER IMMND forced to restart on order
from IMMD, exiting
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer
started (timeout: 60000000000 ns)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO Restarting a component of
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: mkfifo already exists:
/var/lib/opensaf/osafimmnd.fifo File exists
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: Started
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: WA AMF director unexpectedly crashed
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: Rebooting OpenSAF NodeId = 131855 EE
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received,
OwnNodeId = 131855, SupervisionTime = 60
Traces:
>From traces Active 'Failed to find candidate for new IMMND coordinator' and
>Active IMMD has to restart the IMMSv
~~~
Mar 23 11:06:12.535325 osafimmd [2138:src/imm/immd/immd_evt.c:2638] T5 Received
IMMND service event
Mar 23 11:06:12.535349 osafimmd [2138:src/imm/immd/immd_evt.c:2741] T5 PROCESS
MDS EVT: NCSMDS_DOWN, my PID:2138
Mar 23 11:06:12.535451 osafimmd [2138:src/imm/immd/immd_evt.c:2748] T5
NCSMDS_DOWN => local IMMND down
Mar 23 11:06:12.535463 osafimmd [2138:src/imm/immd/immd_evt.c:2763] T5 IMMND
DOWN PROCESS detected by IMMD
Mar 23 11:06:12.535475 osafimmd [2138:src/imm/immd/immd_proc.c:0618] >>
immd_process_immnd_down
Mar 23 11:06:12.535483 osafimmd [2138:src/imm/immd/immd_proc.c:0621] T5
immd_process_immnd_down pid:2149 on-active:1 cb->immnd_coord:2010f
Mar 23 11:06:12.535503 osafimmd [2138:src/imm/immd/immd_proc.c:0628] WA IMMND
coordinator at 2010f apparently crashed => electing new coord
Mar 23 11:06:12.535516 osafimmd [2138:src/imm/immd/immd_proc.c:0204] >>
immd_proc_elect_coord
Mar 23 11:06:12.535536 osafimmd [2138:src/imm/immd/immd_proc.c:0320] ER
**Failed to find candidate for new IMMND coordinator** (ScAbsenceAllowed:0
RulingEpoch:2
Mar 23 11:06:12.535542 osafimmd [2138:src/imm/immd/immd_proc.c:0322] <<
immd_proc_elect_coord
Mar 23 11:06:12.535547 osafimmd [2138:src/imm/immd/immd_proc.c:0059] >>
immd_proc_immd_reset
Mar 23 11:06:12.535560 osafimmd [2138:src/imm/immd/immd_proc.c:0062] ER
**Active IMMD has to restart the IMMSv. All IMMNDs will restart**
Mar 23 11:06:12.535567 osafimmd [2138:src/imm/immd/immd_mbcsv.c:0044] >>
immd_mbcsv_sync_update
Mar 23 11:06:12.535574 osafimmd [2138:src/mbc/mbcsv_api.c:0773] >>
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers,
as per the send-type specified
Mar 23 11:06:12.535582 osafimmd [2138:src/mbc/mbcsv_api.c:0803] TR svc_id:42,
pwe_hdl:65549
Mar 23 11:06:12.535587 osafimmd [2138:src/mbc/mbcsv_api.c:0807] T1 No STANDBY
peers found yet
Mar 23 11:06:12.535593 osafimmd [2138:src/mbc/mbcsv_api.c:0868] <<
mbcsv_process_snd_ckpt_request: retval: 1
Mar 23 11:06:12.535598 osafimmd [2138:src/imm/immd/immd_mbcsv.c:0062] <<
immd_mbcsv_sync_update
Mar 23 11:06:12.535604 osafimmd [2138:src/imm/immd/immd_mds.c:0762] >>
immd_mds_bcast_send
Mar 23 11:06:12.535610 osafimmd [2138:src/imm/common/immsv_evt.c:5422] T8
Sending: IMMND_EVT_D2ND_RESET to 0
Mar 23 11:06:12.535868 osafimmd [2138:src/imm/immd/immd_mds.c:0782] <<
immd_mds_bcast_send
Mar 23 11:06:12.535917 osafimmd [2138:src/imm/immd/immd_proc.c:0104] ER IMM
RELOAD with NO persistent back end => ensure cluster restart by IMMD exit at
both SCs, exiting
~~~
Note:
1. Syslog of Active controler and Pyalod attached
2. Immnd and immd traces attached
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets