Hello,
We are using OpenSAF 5.1.0
Test of the active controller failing over to the standby:

Cluster has 2 controller nodes (rbm-fe-s1-h1, rbm-fe-s2-h1) and 2 payload nodes 
(rbm-fe-s1-h2, rbm-fe-
s2-h2).
When starting the test, rbm-fe-s2-h1 is the active controller.


Active controller  rbm-fe-s2-h1:

Sep 10 20:21:00 rbm-fe-s2-h1 opensafd: Stopping OpenSAF Services
Sep 10 20:21:00 rbm-fe-s2-h1 osafamfnd[28597]: NO Shutdown initiated
Sep 10 20:21:00 rbm-fe-s2-h1 osafamfnd[28597]: NO Removing assignments from AMF 
components
.....   SU terminating
Sep 10 20:21:02 rbm-fe-s2-h1 osafamfnd[28597]: NO Removed assignments from AMF 
components
Sep 10 20:21:02 rbm-fe-s2-h1 osafamfnd[28597]: NO Terminating all AMF components
Sep 10 20:21:02 rbm-fe-s2-h1 osafimmd[28493]: exiting for shutdown
Sep 10 20:21:02 rbm-fe-s2-h1 osafckptd[28629]: exiting for shutdown
Sep 10 20:21:02 rbm-fe-s2-h1 osafimmnd[28511]: NO Implementer locally 
disconnected. Marking it as
doomed 21 <844, 2450f> (safCheckPointService)
Sep 10 20:21:02 rbm-fe-s2-h1 osafimmnd[28511]: WA DISCARD DUPLICATE FEVS 
message:18065
Sep 10 20:21:02 rbm-fe-s2-h1 osafimmnd[28511]: WA Error code 2 returned for 
message type 82 -
ignoring
Sep 10 20:21:02 rbm-fe-s2-h1 osafimmnd[28511]: WA DISCARD DUPLICATE FEVS 
message:18066
Sep 10 20:21:02 rbm-fe-s2-h1 osafimmnd[28511]: WA Error code 2 returned for 
message type 82 -
ignoring
Sep 10 20:21:02 rbm-fe-s2-h1 osafrded[28460]: exiting for shutdown
Sep 10 20:21:02 rbm-fe-s2-h1 osafckptnd[28667]: exiting for shutdown
Sep 10 20:21:02 rbm-fe-s2-h1 osafclmna[28444]: exiting for shutdown
Sep 10 20:21:02 rbm-fe-s2-h1 osaffmd[28476]: exiting for shutdown
Sep 10 20:21:02 rbm-fe-s2-h1 osafsmfnd[28832]: exiting for shutdown
... all the osaf processes exitting
Sep 10 20:21:02 rbm-fe-s2-h1 osafimmnd[28511]: exiting for shutdown
Sep 10 20:21:07 rbm-fe-s2-h1 osafamfnd[28597]: NO Terminated all AMF components
Sep 10 20:21:07 rbm-fe-s2-h1 osafamfnd[28597]: NO Shutdown completed, exiting
Sep 10 20:21:07 rbm-fe-s2-h1 IGMP: AL AMF Node Director is down, terminate this 
process
Sep 10 20:21:07 rbm-fe-s2-h1 UDRU: AL AMF Node Director is down, terminate this 
process
Sep 10 20:21:13 rbm-fe-s2-h1 opensafd: OpenSAF services successfully stopped

1.       Why does it take 13 seconds to fully shutdown the active controller?

Standby controller becomes the active:

Sep 10 20:21:02 rbm-fe-s1-h1 osafimmd[30010]: NO MDS event from svc_id 24 
(change:1, dest:13)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmd[30010]: NO MDS event from svc_id 24 
(change:6, dest:13)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmd[30010]: WA IMMD lost contact with peer 
IMMD
(NCSMDS_RED_DOWN)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: WA DISCARD DUPLICATE FEVS 
message:18065
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: WA Error code 2 returned for 
message type 82 -
ignoring
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: WA DISCARD DUPLICATE FEVS 
message:18066
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: WA Error code 2 returned for 
message type 82 -
ignoring
Sep 10 20:21:02 rbm-fe-s1-h1 osafrded[29977]: NO Peer down on node 0x2450f
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmd[30010]: NO MDS event from svc_id 25 
(change:4,
dest:638880680275807)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmd[30010]: WA IMMND DOWN on active 
controller 45 detected at
standby immd!! 28. Possible failover
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmd[30010]: NO Skipping re-send of fevs 
message 18065 since it has
recently been resent.
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmd[30010]: NO Skipping re-send of fevs 
message 18066 since it has
recently been resent.
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Global discard node received 
for nodeId:2450f
pid:28511
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 12 
<0, 2450f(down)>
(MsgQueueService148751)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 16 
<0, 2450f(down)>
(safLogService)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 19 
<0, 2450f(down)>
(@safLogService_appl)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 17 
<0, 2450f(down)>
(safClmService)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 18 
<0, 2450f(down)>
(safAmfService)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 24 
<0, 2450f(down)>
(safEvtService)
...
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 25 
<0, 2450f(down)>
(safLckService)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 20 
<0, 2450f(down)>
(safMsgGrpService)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 23 
<0, 2450f(down)>
(safSmfService)
Sep 10 20:21:02 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 21 
<0, 2450f(down)>
(safCheckPointService)
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO
'safSu=amfIgniteRaterSU1.1,safSg=amfIgniteRaterSG1,safApp=olcApp' component 
restart probation
timer started (timeout: 1000000 ns)
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO Restarting a component of
'safSu=amfIgniteRaterSU1.1,safSg=amfIgniteRaterSG1,safApp=olcApp' (comp restart 
count: 1)
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO
'safComp=amfRaterComp1.1.3,safSu=amfIgniteRaterSU1.1,safSg=amfIgniteRaterSG1,safApp=olcApp'
faulted due to 'errorReport' : Recovery is 'componentRestart'
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO saAmfSUFailover is true for
'safSu=amfCacheIgniteSU1.1,safSg=amfCacheIgniteSG1,safApp=olcApp'
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO SU failover probation timer 
started (timeout: 0 ns)
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO Performing failover of
'safSu=amfCacheIgniteSU1.1,safSg=amfCacheIgniteSG1,safApp=olcApp' (SU failover 
count: 1)
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO
'safComp=amfCacheComp1.1.1,safSu=amfCacheIgniteSU1.1,safSg=amfCacheIgniteSG1,safApp=olcApp'
recovery action escalated from 'componentFailover' to 'suFailover'
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO
'safComp=amfCacheComp1.1.1,safSu=amfCacheIgniteSU1.1,safSg=amfCacheIgniteSG1,safApp=olcApp'
faulted due to 'errorReport' : Recovery is 'suFailover'
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO Terminating components of
'safSu=amfCacheIgniteSU1.1,safSg=amfCacheIgniteSG1,safApp=olcApp'(abruptly & 
unordered)
Sep 10 20:21:02 rbm-fe-s1-h1 osafamfnd[30116]: NO
'safSu=amfCacheIgniteSU1.1,safSg=amfCacheIgniteSG1,safApp=olcApp' Presence 
State INSTANTIATED
=> TERMINATING
...
Sep 10 20:21:13 rbm-fe-s1-h1 osafdtmd[29938]: NO Lost contact with 
'rbm-fe-s2-h1'
Sep 10 20:21:13 rbm-fe-s1-h1 osaffmd[29993]: NO Node Down event for node id 
2450f:
Sep 10 20:21:13 rbm-fe-s1-h1 osaffmd[29993]: NO Current role: STANDBY
Sep 10 20:21:13 rbm-fe-s1-h1 osaffmd[29993]: Rebooting OpenSAF NodeId = 148751 
EE Name = ,
Reason: Received Node Down for peer controller, OwnNodeId = 141327, 
SupervisionTime = 0
Sep 10 20:21:13 rbm-fe-s1-h1 osaffmd[29993]: node reboot failure: exit code 
32512
Sep 10 20:21:13 rbm-fe-s1-h1 osaffmd[29993]: NO Controller Failover: Setting 
role to ACTIVE
Sep 10 20:21:13 rbm-fe-s1-h1 osafrded[29977]: NO RDE role set to ACTIVE
Sep 10 20:21:13 rbm-fe-s1-h1 osafrded[29977]: NO Running 
'/usr/lib64/opensaf/opensaf_sc_active'
with 0 argument(s)
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmd[30010]: NO ACTIVE request
Sep 10 20:21:13 rbm-fe-s1-h1 osafclmd[30082]: NO ACTIVE request
Sep 10 20:21:13 rbm-fe-s1-h1 osaflogd[30048]: NO ACTIVE request
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmd[30010]: NO ellect_coord invoke from 
rda_callback ACTIVE
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmd[30010]: NO New coord elected, resides at 
2280f
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmd[30010]: NO Old active NOT present => send 
discard node
payload 2520f
Sep 10 20:21:13 rbm-fe-s1-h1 osafamfd[30099]: NO FAILOVER StandBy --> Active
Sep 10 20:21:13 rbm-fe-s1-h1 osafclmd[30082]: 
safNode=rbm-fe-s2-h2,safCluster=myClmCluster LEFT,
init view=4, cluster view=7
Sep 10 20:21:13 rbm-fe-s1-h1 osafamfnd[30116]: NO AVD NEW_ACTIVE, adest:1
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmd[30010]: NO MDS event from svc_id 24 
(change:7, dest:13)
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmd[30010]: NO MDS event from svc_id 24 
(change:2, dest:13)
Sep 10 20:21:13 rbm-fe-s1-h1 osafntfd[30065]: NO ACTIVE request
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO This IMMND is now the NEW 
Coord
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO Global discard node received 
for nodeId:2520f
pid:3690
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 15 
<0, 2520f(down)>
(MsgQueueService152079)
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 28 
(safLogService) <824,
2280f>
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 27 
<828, 2280f>
(@safAmfService2280f)
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 29 
(safClmService) <827,
2280f>
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 30 
(safAmfService) <828,
2280f>
Sep 10 20:21:13 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer (applier) 
connected: 31
(@safLogService_appl) <10152, 2280f>
Sep 10 20:21:13 rbm-fe-s1-h1 osafamfd[30099]: NO Node 'rbm-fe-s2-h1' left the 
cluster
Sep 10 20:21:14 rbm-fe-s1-h1 osafamfd[30099]: NO FAILOVER StandBy --> Active 
DONE!
Sep 10 20:21:14 rbm-fe-s1-h1 osafamfnd[30116]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE
to 'safSu=rbm-fe-s1-h1,safSg=2N,safApp=OpenSAF'
Sep 10 20:21:14 rbm-fe-s1-h1 osafamfnd[30116]: NO Assigning 
'safSi=amfRMPSI1.1,safApp=olcApp'
ACTIVE to 'safSu=amfRMPSU1.1,safSg=amfRMPSG1,safApp=olcApp'
Sep 10 20:21:14 rbm-fe-s1-h1 osafamfnd[30116]: WA susi_assign_evh:
'safSu=amfCacheIgniteSU1.1,safSg=amfCacheIgniteSG1,safApp=olcApp' has no 
assignments
Sep 10 20:21:14 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 32 
(safMsgGrpService)
<841, 2280f>
Sep 10 20:21:14 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 33
(safCheckPointService) <843, 2280f>
Sep 10 20:21:14 rbm-fe-s1-h1 osafamfnd[30116]: NO Assigned 
'safSi=amfRMPSI1.1,safApp=olcApp'
ACTIVE to 'safSu=amfRMPSU1.1,safSg=amfRMPSG1,safApp=olcApp'
Sep 10 20:21:14 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 34 
(safLckService) <842,
2280f>
Sep 10 20:21:14 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 35 
(safEvtService) <820,
2280f>
Sep 10 20:21:14 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer connected: 36
(MsgQueueService152079) <10156, 2280f>
Sep 10 20:21:14 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer locally 
disconnected. Marking it as
doomed 36 <10156, 2280f> (MsgQueueService152079)
Sep 10 20:21:14 rbm-fe-s1-h1 osafimmnd[30028]: NO Implementer disconnected 36 
<10156, 2280f>
(MsgQueueService152079)
....
Sep 10 20:21:14 rbm-fe-s1-h1 osafamfd[30099]: NO Node 'rbm-fe-s2-h2' left the 
cluster

2.       The standby recognizes that the immnd quickly but does not perform the 
assignment to ACTIVE
until the same time as on the original ACTIVE stating that it is fully stopped.
3.       While there is no ACTIVE, the components' IMM and NTF queries are 
failing with the retry
error.   The components hit the max retries (the code does not retry forever) 
and then fail,
restart, fail, restart,....
Why are the components dependent on the IMM on the controller, shouldn't it be 
using the
IMM on the node?

>From the IMM documentation:
2.2.2         IMM Node Director
The IMMND process executes on all nodes (both controller and payload). The 
IMMND process
contains the IMM repository and is the actual provider of the IMMSv at the 
node. All connections
and sessions started at the node are handled by the IMMND at that node.



Payload on s2-h2 - the immnd restarted

[rbm-fe-s2-h2(Lopnsaf)telenet-lab2:/sft/Lopnsaf/HA_ROOT/logs/instantiation] ps 
-ef | grep osaf
Lopnsaf   3666     1  0 18:23 ?        00:01:23 /usr/lib64/opensaf/osafdtmd 
--tracemask=0xffffffff
root      3677     1  0 18:23 ?        00:00:02 /bin/sh 
/usr/lib64/opensaf/clc-cli/osaf-transport-monitor
Lopnsaf   3709     1  0 18:23 ?        00:00:00 /usr/lib64/opensaf/osafclmna
root      3725     1  1 18:23 ?        00:02:17 /usr/lib64/opensaf/osafamfnd 
--tracemask=0xffffffff
Lopnsaf   3745     1  0 18:23 ?        00:00:00 /usr/lib64/opensaf/osafamfwd
Lopnsaf   3767     1  0 18:23 ?        00:00:00 /usr/lib64/opensaf/osafckptnd
Lopnsaf   3788     1  0 18:23 ?        00:00:00 /usr/lib64/opensaf/osaflcknd
Lopnsaf   3832     1  0 18:23 ?        00:00:00 /usr/lib64/opensaf/osafmsgnd
root      3854     1  0 18:23 ?        00:00:00 /usr/lib64/opensaf/osafsmfnd
Lopnsaf  11886     1  1 20:21 ?        00:00:22 /usr/lib64/opensaf/osafimmnd
Lopnsaf  26579 15860  0 20:44 pts/3    00:00:00 grep --color=auto osaf


Sep 10 20:21:02 rbm-fe-s2-h2 osafimmnd[3690]: ER No IMMD service => cluster 
restart, exiting
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO 'safSu=rbm-fe-s2-
h2,safSg=NoRed,safApp=OpenSAF' component restart probation timer started 
(timeout: 60000000000
ns)
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO Restarting a component of 
'safSu=rbm-fe-s2-
h2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO 'safComp=IMMND,safSu=rbm-fe-s2-
h2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 
'componentRestart'
Sep 10 20:21:02 rbm-fe-s2-h2 osafimmnd[11886]: Started                    
/////// This does not register with
the active controller and all of the processes on this node, start to bounce
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO
'safSu=amfIgniteRaterSU1.4,safSg=amfIgniteRaterSG1,safApp=olcApp' component 
restart probation
timer started (timeout: 1000000 ns)
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO Restarting a component of
'safSu=amfIgniteRaterSU1.4,safSg=amfIgniteRaterSG1,safApp=olcApp' (comp restart 
count: 1)
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO
'safComp=amfRaterComp1.4.1,safSu=amfIgniteRaterSU1.4,safSg=amfIgniteRaterSG1,safApp=olcApp'
faulted due to 'errorReport' : Recovery is 'componentRestart'
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO
'safSu=amfIgniteRaterSU1.4,safSg=amfIgniteRaterSG1,safApp=olcApp' Component or 
SU restart
probation timer expired
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO saAmfSUFailover is true for
'safSu=amfIgniteUDRUSU2.6,safSg=amfIgniteUDRUSG2,safApp=olcApp'
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO SU failover probation timer 
started (timeout: 0 ns)
Sep 10 20:21:02 rbm-fe-s2-h2 osafamfnd[3725]: NO Performing failover of
'safSu=amfIgniteUDRUSU2.6,safSg=amfIgniteUDRUSG2,safApp=olcApp' (SU failover 
count: 1)

4.       One of the payloads for some reason had its IMMD service die.  It was 
restarted while there
was no ACTIVE controller.   It never registers to the active controller and all 
of the processes on
this payload bounce forever.
The payload node was stopped and restarted and the components stopped bouncing.
The other payload node shown below did not have this problem.

Payload on s1-h2 is stable:

Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: WA DISCARD DUPLICATE FEVS 
message:18065
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: WA Error code 2 returned for 
message type 82 -
ignoring
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: WA DISCARD DUPLICATE FEVS 
message:18066
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: WA Error code 2 returned for 
message type 82 -
ignoring
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Global discard node received 
for nodeId:2450f
pid:28511
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 12 
<0, 2450f(down)>
(MsgQueueService148751)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 21 
<0, 2450f(down)>
(safCheckPointService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 23 
<0, 2450f(down)>
(safSmfService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 20 
<0, 2450f(down)>
(safMsgGrpService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 25 
<0, 2450f(down)>
(safLckService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 24 
<0, 2450f(down)>
(safEvtService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 18 
<0, 2450f(down)>
(safAmfService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 17 
<0, 2450f(down)>
(safClmService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 19 
<0, 2450f(down)>
(@safLogService_appl)
Sep 10 20:21:02 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 16 
<0, 2450f(down)>
(safLogService)
Sep 10 20:21:02 rbm-fe-s1-h2 osafamfnd[29144]: NO saAmfSUFailover is true for
'safSu=amfIgniteRARSU1.4,safSg=amfIgniteRARSG1,safApp=olcApp'
Sep 10 20:21:02 rbm-fe-s1-h2 osafamfnd[29144]: NO SU failover probation timer 
started (timeout: 0 ns)
Sep 10 20:21:02 rbm-fe-s1-h2 osafamfnd[29144]: NO Performing failover of
'safSu=amfIgniteRARSU1.4,safSg=amfIgniteRARSG1,safApp=olcApp' (SU failover 
count: 1)
Sep 10 20:21:02 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safComp=amfRARComp1.4.1,safSu=amfIgniteRARSU1.4,safSg=amfIgniteRARSG1,safApp=olcApp'
recovery action escalated from 'componentFailover' to 'suFailover'
Sep 10 20:21:02 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safComp=amfRARComp1.4.1,safSu=amfIgniteRARSU1.4,safSg=amfIgniteRARSG1,safApp=olcApp'
 faulted
due to 'errorReport' : Recovery is 'suFailover'
Sep 10 20:21:02 rbm-fe-s1-h2 osafamfnd[29144]: NO Terminating components of
'safSu=amfIgniteRARSU1.4,safSg=amfIgniteRARSG1,safApp=olcApp'(abruptly & 
unordered)
....
Sep 10 20:21:13 rbm-fe-s1-h2 osafdtmd[29085]: NO Lost contact with 
'rbm-fe-s2-h1'
....
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO AVD NEW_ACTIVE, adest:1
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safSu=amfIgniteIRPSU1.6,safSg=amfIgniteIRPSG1,safApp=olcApp' component restart 
probation timer
started (timeout: 1000000 ns)
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO Restarting a component of
'safSu=amfIgniteIRPSU1.6,safSg=amfIgniteIRPSG1,safApp=olcApp' (comp restart 
count: 1)
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safComp=amfIRPComp1.6.1,safSu=amfIgniteIRPSU1.6,safSg=amfIgniteIRPSG1,safApp=olcApp'
 faulted
due to 'avaDown' : Recovery is 'componentRestart'
Sep 10 20:21:13 rbm-fe-s1-h2 osafimmnd[29109]: NO Global discard node received 
for nodeId:2520f
pid:3690
Sep 10 20:21:13 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 15 
<0, 2520f(down)>
(MsgQueueService152079)
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safSu=amfIgniteIRPSU1.2,safSg=amfIgniteIRPSG1,safApp=olcApp' component restart 
probation timer
started (timeout: 1000000 ns)
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO Restarting a component of
'safSu=amfIgniteIRPSU1.2,safSg=amfIgniteIRPSG1,safApp=olcApp' (comp restart 
count: 1)
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safComp=amfIRPComp1.2.1,safSu=amfIgniteIRPSU1.2,safSg=amfIgniteIRPSG1,safApp=olcApp'
 faulted
due to 'avaDown' : Recovery is 'componentRestart'
Sep 10 20:21:13 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 28 
(safLogService) <0,
2280f>
Sep 10 20:21:13 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 27 
<0, 2280f>
(@safAmfService2280f)
Sep 10 20:21:13 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 29 
(safClmService) <0,
2280f>
Sep 10 20:21:13 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 30 
(safAmfService) <0,
2280f>
Sep 10 20:21:13 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer (applier) 
connected: 31
(@safLogService_appl) <0, 2280f>
Sep 10 20:21:13 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safSu=amfIgniteIRPSU2.4,safSg=amfIgniteIRPSG2,safApp=olcApp' component restart 
probation timer
started (timeout: 1000000 ns)
...
Sep 10 20:21:14 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 32 
(safMsgGrpService)
<0, 2280f>
Sep 10 20:21:14 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 33
(safCheckPointService) <0, 2280f>
Sep 10 20:21:14 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 34 
(safLckService) <0,
2280f>
Sep 10 20:21:14 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 35 
(safEvtService) <0,
2280f>
Sep 10 20:21:14 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 36
(MsgQueueService152079) <0, 2280f>
Sep 10 20:21:14 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 36 
<0, 2280f>
(MsgQueueService152079)
Sep 10 20:21:14 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 37 
(safSmfService) <0,
2280f>
Sep 10 20:21:17 rbm-fe-s1-h2 osafamfnd[29144]: WA susi_assign_evh:
'safSu=amfCacheIgniteSU2.2,safSg=amfCacheIgniteSG2,safApp=olcApp' has no 
assignments
Sep 10 20:21:18 rbm-fe-s1-h2 osafamfnd[29144]: NO Repair request for
'safSu=amfIgniteRARSU1.4,safSg=amfIgniteRARSG1,safApp=olcApp'
Sep 10 20:21:18 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safSu=amfIgniteRARSU1.4,safSg=amfIgniteRARSG1,safApp=olcApp' Presence State 
UNINSTANTIATED =>
UNINSTANTIATED
Sep 10 20:21:18 rbm-fe-s1-h2 osafamfnd[29144]: NO Repair request for
'safSu=amfIgniteUDRUSU1.8,safSg=amfIgniteUDRUSG1,safApp=olcApp'
Sep 10 20:21:18 rbm-fe-s1-h2 osafamfnd[29144]: NO
'safSu=amfIgniteUDRUSU1.8,safSg=amfIgniteUDRUSG1,safApp=olcApp' Presence State
UNINSTANTIATED => UNINSTANTIATED
...
Sep 10 20:21:22 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer connected: 38
(MsgQueueService148751) <0, 2280f>
Sep 10 20:21:22 rbm-fe-s1-h2 osafimmnd[29109]: NO Implementer disconnected 38 
<0, 2280f>
(MsgQueueService148751)
...


Thanks.




________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.

_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to