[tickets] [opensaf:tickets] #2393 Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload

Ritu Raj Wed, 22 Mar 2017 22:58:39 -0700


---

** [tickets:#2393] Immd got crashed on Active as immnd restarted on Active with 
cluster having single controller and payload**

**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:58 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 23, 2017 05:58 AM UTC
**Owner:** nobody
**Attachments:**

- 
[PL-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/PL-3.tar.bz2)
 (558.9 kB; application/x-bzip)
- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/SC-1.tar.bz2)
 (2.5 MB; application/x-bzip)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
2 nodes setup(1 controller and 1 payload)

###Summary
Immd got crashed on Active as immnd restarted on Active with cluster having 
single controller and payload

###Steps followed & Observed behaviour
1. Bring up cluster wtih 1 controller and 1 payload 
2. Kill immnd on active controller 
3. Observed, that immd got crashed on Active controller(SC-1) due to which 
Payload also got rebooted

** Issue obserbed when there is only one controller **

**Syslog**
SC-1:::

Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 60000000000 ns)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar 23 11:06:12 SO-SLOT-1 osafsmfd[2235]: WA DispatchOiCallback: 
saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)'
Mar 23 11:06:12 SO-SLOT-1 osafntfimcnd[2181]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: WA IMMND coordinator at 2010f 
apparently crashed => electing new coord
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new 
IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Active IMMD has to restart the 
IMMSv. All IMMNDs will restart
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back 
end => ensure cluster restart by IMMD exit at both SCs, exiting
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
Mar 23 11:06:12 SO-SLOT-1 opensaf_reboot: Rebooting local node; timeout=60

PL-3:::
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2280]: ER IMMND forced to restart on order 
from IMMD, exiting
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 60000000000 ns)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO Restarting a component of 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: mkfifo already exists: 
/var/lib/opensaf/osafimmnd.fifo File exists
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: Started
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: WA AMF director unexpectedly crashed
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: Rebooting OpenSAF NodeId = 131855 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131855, SupervisionTime = 60

Traces:
>From traces Active 'Failed to find candidate for new IMMND coordinator' and 
>Active IMMD has to restart the IMMSv
~~~
Mar 23 11:06:12.535325 osafimmd [2138:src/imm/immd/immd_evt.c:2638] T5 Received 
IMMND service event
Mar 23 11:06:12.535349 osafimmd [2138:src/imm/immd/immd_evt.c:2741] T5 PROCESS 
MDS EVT: NCSMDS_DOWN, my PID:2138
Mar 23 11:06:12.535451 osafimmd [2138:src/imm/immd/immd_evt.c:2748] T5 
NCSMDS_DOWN => local IMMND down
Mar 23 11:06:12.535463 osafimmd [2138:src/imm/immd/immd_evt.c:2763] T5 IMMND 
DOWN PROCESS detected by IMMD
Mar 23 11:06:12.535475 osafimmd [2138:src/imm/immd/immd_proc.c:0618] >> 
immd_process_immnd_down
Mar 23 11:06:12.535483 osafimmd [2138:src/imm/immd/immd_proc.c:0621] T5 
immd_process_immnd_down pid:2149 on-active:1 cb->immnd_coord:2010f
Mar 23 11:06:12.535503 osafimmd [2138:src/imm/immd/immd_proc.c:0628] WA IMMND 
coordinator at 2010f apparently crashed => electing new coord
Mar 23 11:06:12.535516 osafimmd [2138:src/imm/immd/immd_proc.c:0204] >> 
immd_proc_elect_coord
Mar 23 11:06:12.535536 osafimmd [2138:src/imm/immd/immd_proc.c:0320] ER 
**Failed to find candidate for new IMMND coordinator** (ScAbsenceAllowed:0 
RulingEpoch:2
Mar 23 11:06:12.535542 osafimmd [2138:src/imm/immd/immd_proc.c:0322] << 
immd_proc_elect_coord
Mar 23 11:06:12.535547 osafimmd [2138:src/imm/immd/immd_proc.c:0059] >> 
immd_proc_immd_reset
Mar 23 11:06:12.535560 osafimmd [2138:src/imm/immd/immd_proc.c:0062] ER 
**Active IMMD has to restart the IMMSv. All IMMNDs will restart**
Mar 23 11:06:12.535567 osafimmd [2138:src/imm/immd/immd_mbcsv.c:0044] >> 
immd_mbcsv_sync_update
Mar 23 11:06:12.535574 osafimmd [2138:src/mbc/mbcsv_api.c:0773] >> 
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, 
as per the send-type specified
Mar 23 11:06:12.535582 osafimmd [2138:src/mbc/mbcsv_api.c:0803] TR svc_id:42, 
pwe_hdl:65549
Mar 23 11:06:12.535587 osafimmd [2138:src/mbc/mbcsv_api.c:0807] T1 No STANDBY 
peers found yet
Mar 23 11:06:12.535593 osafimmd [2138:src/mbc/mbcsv_api.c:0868] << 
mbcsv_process_snd_ckpt_request: retval: 1
Mar 23 11:06:12.535598 osafimmd [2138:src/imm/immd/immd_mbcsv.c:0062] << 
immd_mbcsv_sync_update
Mar 23 11:06:12.535604 osafimmd [2138:src/imm/immd/immd_mds.c:0762] >> 
immd_mds_bcast_send
Mar 23 11:06:12.535610 osafimmd [2138:src/imm/common/immsv_evt.c:5422] T8 
Sending:  IMMND_EVT_D2ND_RESET to 0
Mar 23 11:06:12.535868 osafimmd [2138:src/imm/immd/immd_mds.c:0782] << 
immd_mds_bcast_send
Mar 23 11:06:12.535917 osafimmd [2138:src/imm/immd/immd_proc.c:0104] ER IMM 
RELOAD with NO persistent back end => ensure cluster restart by IMMD exit at 
both SCs, exiting
~~~

Note:
1. Syslog of Active controler and Pyalod attached
2. Immnd and immd traces attached


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2393 Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload

Reply via email to