I tried to reproduce it on CS #6774, but couldn't reproduce. Please find the
application attached. The quiscing reject was simulated in application shuring
SU1 shutdown.
Please note that because of fix #601 (CS #4847), the recovery is escalated to
component failover.
SC-1 logs:
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigning 'all (5) SIs' QUIESCI
NG to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigning 'safSi=AmfDemo1,safAp
p=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigning 'safSi=AmfDemo2,safAp
p=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigning 'safSi=AmfDemo3,safAp
p=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigning 'safSi=AmfDemo4,safAp
p=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigning 'safSi=AmfDemo5,safAp
p=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 amf_demo[9469]: CSI Set - HAState Quiescing for all
assigned CSIs
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO component with QUIESCED/QUIESCI
NG assignment failed
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO recovery action 'comp restart'
escalated to 'comp failover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO SU failover probation timer sta
rted (timeout: 1200000000000 ns)
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Performing failover of 'safSu=S
U1,safSg=AmfDemo_2N,safApp=AmfDemo1' (SU failover count: 1)
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO 'safComp=AmfDemo1,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
recovery action escalated from 'componentRes
tart' to 'componentFailover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO 'safComp=AmfDemo1,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
faulted due to 'csiSetcallbackFailed' : Reco
very is 'componentFailover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO 'safSu=SU1,safSg=AmfDemo_2N,saf
App=AmfDemo1' Presence State
INSTANTIATED => TERMINATING
Sep 7 15:49:14 PM_SC-1 amf_demo[9461]: CSI Set - HAState Quiescing for all
assigned CSIs
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO component with QUIESCED/QUIESCI
NG assignment failed
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO recovery action 'comp restart'
escalated to 'comp failover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Performing failover of 'safSu=S
U1,safSg=AmfDemo_2N,safApp=AmfDemo1' (SU failover count: 2)
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO 'safComp=AmfDemo2,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
recovery action escalated from 'componentRes
tart' to 'componentFailover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO 'safComp=AmfDemo2,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
faulted due to 'csiSetcallbackFailed' : Reco
very is 'componentFailover'
Sep 7 15:49:14 PM_SC-1 amf_demo[9453]: CSI Set - HAState Quiescing for all
assigned CSIs
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO component with QUIESCED/QUIESCI
NG assignment failed
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO recovery action 'comp restart'
escalated to 'comp failover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO SU failovers have reached confi
gured limit of 2
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO SU failover probation timer sto
pped
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO 'safComp=AmfDemo3,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
recovery action escalated from 'componentRes
tart' to 'nodeFailover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO 'safComp=AmfDemo3,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
faulted due to 'csiSetcallbackFailed' : Reco
very is 'nodeFailover'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Terminating all application com
ponents (abruptly & unordered)
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: IN 'safComp=AmfDemo3,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
Presence State INSTANTIATED => TERMINATING
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: IN Assigned 'all CSIs' QUIESCING t
o
'safComp=AmfDemo3,safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigned 'safSi=AmfDemo1,safApp
=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigned 'safSi=AmfDemo2,safApp
=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigned 'safSi=AmfDemo3,safApp
=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigned 'safSi=AmfDemo4,safApp
=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigned 'safSi=AmfDemo5,safApp
=AmfDemo1' QUIESCING to
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:14 PM_SC-1 osafamfnd[9280]: NO Assigned 'all SIs' QUIESCED of
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 amf_demo[9461]: exiting (caught term signal)
Sep 7 15:49:15 PM_SC-1 amf_demo[9453]: exiting (caught term signal)
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN 'safComp=AmfDemo2,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
Presence State TERMINATING => UNINSTANTIATED
Sep 7 15:49:15 PM_SC-1 amf_demo[9469]: exiting (caught term signal)
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN 'safComp=AmfDemo1,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
Presence State TERMINATING => UNINSTANTIATED
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removing 'all (5) SIs' from 'sa
fSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removing 'safSi=AmfDemo1,safApp
=AmfDemo1' from
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN Removing 'all CSIs' from 'safCo
mp=AmfDemo1,safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN Removed 'all CSIs' from 'safCom
p=AmfDemo1,safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN Removing 'all CSIs' from 'safCo
mp=AmfDemo2,safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN Removed 'all CSIs' from 'safCom
p=AmfDemo2,safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN Removing 'all CSIs' from 'safCo
mp=AmfDemo3,safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN Removed 'all CSIs' from 'safCom
p=AmfDemo3,safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removed 'safSi=AmfDemo1,safApp=
AmfDemo1' from
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removed 'safSi=AmfDemo2,safApp=
AmfDemo1' from
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removed 'safSi=AmfDemo3,safApp=
AmfDemo1' from
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removed 'safSi=AmfDemo4,safApp=
AmfDemo1' from
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removed 'safSi=AmfDemo5,safApp=
AmfDemo1' from
'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Removed 'all SIs' from 'safSu=S
U1,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN 'safComp=AmfDemo2,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
Presence State UNINSTANTIATED => UNINSTANTIA
TED
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: IN 'safComp=AmfDemo3,safSu=SU1,saf
Sg=AmfDemo_2N,safApp=AmfDemo1'
Presence State TERMINATING => UNINSTANTIATED
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Terminated all application comp
onents
Sep 7 15:49:15 PM_SC-1 osafamfnd[9280]: NO Informing director of node fail
-over
================
SC-2 logs:
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigning 'all (5) SIs' ACTIVE to
's
afSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigning
'safSi=AmfDemo1,safApp=Amf Demo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigning
'safSi=AmfDemo2,safApp=Amf Demo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigning
'safSi=AmfDemo3,safApp=Amf Demo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigning
'safSi=AmfDemo4,safApp=Amf Demo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigning
'safSi=AmfDemo5,safApp=Amf Demo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 amf_demo[9696]: CSI Set - HAState Active for all
assigne d CSIs
Sep 7 15:49:15 PM_SC-2 amf_demo[9712]: CSI Set - HAState Active for all
assigne d CSIs
Sep 7 15:49:15 PM_SC-2 amf_demo[9704]: CSI Set - HAState Active for all
assigne d CSIs
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigned
'safSi=AmfDemo1,safApp=AmfD emo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigned
'safSi=AmfDemo2,safApp=AmfD emo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigned
'safSi=AmfDemo3,safApp=AmfD emo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigned
'safSi=AmfDemo4,safApp=AmfD emo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigned
'safSi=AmfDemo5,safApp=AmfD emo1'
ACTIVE to 'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:15 PM_SC-2 osafamfnd[9533]: NO Assigned 'all SIs' ACTIVE of
'safSu=
SU2,safSg=AmfDemo_2N,safApp=AmfDemo1'
Sep 7 15:49:17 PM_SC-2 osaffmd[9452]: NO Node Down event for node id 2010f:
Sep 7 15:49:17 PM_SC-2 osaffmd[9452]: NO Current role: STANDBY
Sep 7 15:49:17 PM_SC-2 osaffmd[9452]: Rebooting OpenSAF NodeId = 131343 EE
Name = , Reason: Received Node
Down for peer controller, OwnNodeId = 131599, Supervi
sionTime = 60
Sep 7 15:49:17 PM_SC-2 osafimmd[9462]: WA IMMD lost contact with peer IMMD
(NCS MDS_RED_DOWN)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: WA DISCARD DUPLICATE FEVS message:1638
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: WA Error code 2 returned for message
ty pe 82 - ignoring
Sep 7 15:49:17 PM_SC-2 kernel: [14064.380084] tipc: Resetting link
<1.1.2:eth0- 1.1.1:eth0>, peer not
responding
Sep 7 15:49:17 PM_SC-2 kernel: [14064.380092] tipc: Lost link
<1.1.2:eth0-1.1.1 :eth0> on
network plane A
Sep 7 15:49:17 PM_SC-2 kernel: [14064.380096] tipc: Lost contact with <1.1.1>
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: WA DISCARD DUPLICATE FEVS message:1639
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: WA Error code 2 returned for message
ty pe 82 - ignoring
Sep 7 15:49:17 PM_SC-2 osafimmd[9462]: WA IMMND DOWN on active controller f1
de tected at standby immd!! f2.
Possible failover
Sep 7 15:49:17 PM_SC-2 osafimmd[9462]: NO Skipping re-send of fevs message
1638 since it has recently been
resent.
Sep 7 15:49:17 PM_SC-2 osafimmd[9462]: NO Skipping re-send of fevs message
1639 since it has recently been
resent.
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Global discard node received for
nod eId:2010f pid:9217
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 1 <0,
2010f (down)> (safLogService)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 2 <0,
2010f (down)> (safClmService)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 3 <0,
2010f (down)> (safAmfService)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 5 <0,
2010f (down)>
(MsgQueueService131343)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 6 <0,
2010f (down)> (safMsgGrpService)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 7 <0,
2010f (down)>
(safCheckPointService)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 8 <0,
2010f (down)> (safLckService)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 9 <0,
2010f (down)> (safEvtService)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 10 <0,
2010 f(down)> (safSmfService)
Sep 7 15:49:17 PM_SC-2 opensaf_reboot: Rebooting remote node in the absence of
PLM is outside the scope of
OpenSAF
Sep 7 15:49:17 PM_SC-2 osaffmd[9452]: NO Controller Failover: Setting role to
A CTIVE
Sep 7 15:49:17 PM_SC-2 osafrded[9443]: NO RDE role set to ACTIVE
Sep 7 15:49:17 PM_SC-2 osafimmd[9462]: NO ACTIVE request
Sep 7 15:49:17 PM_SC-2 osaflogd[9483]: NO ACTIVE request
Sep 7 15:49:17 PM_SC-2 osafntfd[9494]: NO ACTIVE request
Sep 7 15:49:17 PM_SC-2 osafamfd[9523]: NO FAILOVER StandBy --> Active
Sep 7 15:49:17 PM_SC-2 osafclmd[9504]: NO ACTIVE request
Sep 7 15:49:17 PM_SC-2 osafimmd[9462]: NO ellect_coord invoke from
rda_callback ACTIVE
Sep 7 15:49:17 PM_SC-2 osafimmd[9462]: NO New coord elected, resides at 2020f
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO This IMMND is now the NEW Coord
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 12
(safLogSer vice) <1, 2020f>
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 4 <9,
2020f > (@safAmfService2020f)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 13
(safAmfSer vice) <9, 2020f>
Sep 7 15:49:17 PM_SC-2 osafamfd[9523]: NO Node 'SC-1' left the cluster
Sep 7 15:49:17 PM_SC-2 osafamfd[9523]: NO FAILOVER StandBy --> Active DONE!
Sep 7 15:49:17 PM_SC-2 osafamfnd[9533]: NO Assigning
'safSi=SC-2N,safApp=OpenSA F'
ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 14
(safMsgGrp Service) <294, 2020f>
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 15
(safCheckP ointService) <297,
2020f>
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 16
(MsgQueueS ervice131343) <322,
2020f>
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Backup create cmd =
/usr/local/lib/op
ensaf/smf-backup-create
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Bundle check cmd =
/usr/local/lib/ope
nsaf/smf-bundle-check
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Node check cmd =
/usr/local/lib/opens
af/smf-node-check
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO SMF repository check cmd =
/usr/local
/lib/opensaf/smf-repository-check
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Cluster reboot cmd =
/usr/local/lib/o
pensaf/smf-cluster-reboot
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Admin Op Timeout = 600000000000
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Cli Timeout = 600000000000
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Reboot Timeout = 600000000000
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO SMF will use the STEP standard set
of actions.
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO DN for si_swap operation =
safSi=SC-2 N,safApp=OpenSAF
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO SI si_swap operation max retry = 200
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Max num of campaign restarts = 10
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO IMM persist command = immdump
/etc/op ensaf/imm.xml
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Node reboot cmd = reboot
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Turn PBE off during upgrade = 1
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Verify Enable = 0
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO Verify Timeout = 100000000000
Sep 7 15:49:17 PM_SC-2 osafsmfd[9551]: NO smfKeepDuState = 0
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer locally disconnected.
Ma rking it as doomed 16 <322,
2020f> (MsgQueueService131343)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer disconnected 16 <322,
20 20f> (MsgQueueService131343)
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 17
(safClmSer vice) <4, 2020f>
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 18
(safSmfSer vice) <320, 2020f>
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 19
(safLckSer vice) <295, 2020f>
Sep 7 15:49:17 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 20
(safEvtSer vice) <296, 2020f>
Sep 7 15:49:17 PM_SC-2 osafamfnd[9533]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF '
ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Sep 7 15:50:12 PM_SC-2 kernel: [14119.035148] tipc: Established link
<1.1.2:eth 0-1.1.1:eth0> on
network plane A
Sep 7 15:50:12 PM_SC-2 osafimmd[9462]: NO New IMMND process is on STANDBY
Contr oller at 2010f
Sep 7 15:50:12 PM_SC-2 osafimmd[9462]: WA IMMND on controller (not currently
co ord) requests sync
Sep 7 15:50:12 PM_SC-2 osafimmd[9462]: NO Node 2010f request sync
sync-pid:3468 epoch:0
Sep 7 15:50:12 PM_SC-2 osafimmnd[9473]: NO Announce sync, epoch:3
Sep 7 15:50:12 PM_SC-2 osafimmnd[9473]: NO SERVER STATE: IMM_SERVER_READY -->
I MM_SERVER_SYNC_SERVER
Sep 7 15:50:12 PM_SC-2 osafimmnd[9473]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Sep 7 15:50:12 PM_SC-2 osafimmd[9462]: NO Successfully announced sync. New
ruli ng epoch:3
Sep 7 15:50:13 PM_SC-2 osafimmloadd: NO Sync starting
Sep 7 15:50:13 PM_SC-2 osafimmloadd: IN Synced 464 objects in total
Sep 7 15:50:13 PM_SC-2 osafimmnd[9473]: NO NODE STATE->
IMM_NODE_FULLY_AVAILABL E 16817
Sep 7 15:50:13 PM_SC-2 osafimmnd[9473]: NO Epoch set to 3 in ImmModel
Sep 7 15:50:13 PM_SC-2 osafimmd[9462]: NO ACT: New Epoch for IMMND process at
n ode 2020f old epoch: 2 new
epoch:3
Sep 7 15:50:13 PM_SC-2 osafimmloadd: NO Sync ending normally
Sep 7 15:50:13 PM_SC-2 osafimmd[9462]: NO ACT: New Epoch for IMMND process at
n ode 2010f old epoch: 0 new
epoch:3
Sep 7 15:50:13 PM_SC-2 osafimmnd[9473]: NO SERVER STATE:
IMM_SERVER_SYNC_SERVER --> IMM
SERVER READY
Sep 7 15:50:14 PM_SC-2 osafimmnd[9473]: NO Implementer (applier) connected: 21
(@safAmfService2010f) <0, 2010f>
Sep 7 15:50:17 PM_SC-2 osafamfd[9523]: NO Node 'SC-1' joined the cluster
Sep 7 15:50:19 PM_SC-2 osafimmnd[9473]: NO Implementer connected: 22
(MsgQueueS ervice131343) <0, 2010f>
So, I can't see any problem.
Thanks
-Nagu
---
** [tickets:#68] failover didnot succeed and cluster got reset due to MDS
problems.**
**Status:** assigned
**Milestone:** 4.5.2
**Created:** Sat May 11, 2013 05:22 PM UTC by surender khetavath
**Last Updated:** Tue Aug 11, 2015 06:43 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**
- [logs.tgz](https://sourceforge.net/p/opensaf/tickets/68/attachment/logs.tgz)
(16.2 MB; application/x-compressed-tar)
Changeset : 4241 with 2794&3117 patch
Model : TwoN
configuration: 1App,1SG,4SUs with 3comps each and 5SIs with 3CSIs each
Transport : TCP/ipv6-linklocal
PBE enabled.
scenario:
sc1 was active and sc2 standby.
Active SU on Sc1 was shutdown and component was made to reject quiescing
assignment. Component got restarted for 10times as compRestartMax=10 and then
escalated to nodefailover following a suFailover.
sc-2 didnot become active, and eventually rebooted. Thus causing a cluster
reset.
syslog on sc-1:
--------------
May 11 21:24:49 sc-1 osafimmnd[4683]: WA Error code 2 returned for message type
21 - ignoring
May 11 21:24:49 sc-1 osafamfnd[4790]: NO Received reboot order, ordering reboot
now!
May 11 21:24:49 sc-1 osafamfnd[4790]: Rebooting OpenSAF NodeId = 131343 EE Name
= , Reason: Received reboot order
May 11 21:24:49 sc-1 opensaf_reboot: Rebooting local node
May 11 21:24:49 sc-1 osafimmnd[4683]: WA MESSAGE:5319 OUT OF ORDER my highest
processed:5317, exiting
May 11 21:24:49 sc-1 osafimmpbed: WA PBE lost contact with parent IMMND -
Exiting
May 11 21:24:49 sc-1 osafntfimcnd[4734]: ER saImmOiDispatch() Fail
SA_AIS_ERR_BAD_HANDLE (9)
May 11 21:24:49 sc-1 osafimmd[4668]: WA IMMND coordinator at 2010f apparently
crashed => electing new coord
May 11 21:24:49 sc-1 osafimmd[4668]: ER Failed to find candidate for new IMMND
coordinator
May 11 21:24:49 sc-1 osafimmd[4668]: ER Active IMMD has to restart the IMMSv.
All IMMNDs will restart
May 11 21:24:49 sc-1 osafimmd[4668]: ER IMM RELOAD => ensure cluster restart
by IMMD exit at both SCs, exiting
syslog on sc-2:
----------------
May 11 21:24:49 sc-2 osafimmd[3894]: WA IMMD not re-electing coord for
switch-over (si-swap) coord at (2010f)
May 11 21:24:49 sc-2 osafntfimcnd[3969]: NO exiting on signal 15
May 11 21:24:49 sc-2 osafsmfd[4052]: ER amf_active_state_handler oi activate
FAILED
May 11 21:24:49 sc-2 osafamfnd[4023]: NO
'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to
'csiSetcallbackFailed' : Recovery is 'nodeFailfast'
May 11 21:24:49 sc-2 osafamfnd[4023]: ER
safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackFailed Recovery is:nodeFailfast
May 11 21:24:49 sc-2 osafamfnd[4023]: Rebooting OpenSAF NodeId = 131599 EE Name
= , Reason: Component faulted: recovery is node failfast
May 11 21:24:49 sc-2 osafmsgd[4216]: ER mqd_imm_declare_implementer failed: err
= 14
May 11 21:24:49 sc-2 osafckptd[4202]: ER cpd immOiImplmenterSet failed with err
= 14
May 11 21:24:49 sc-2 opensaf_reboot: Rebooting local node
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets