If opensafd on standby is successfully started, then it means the standby node 
is ready to take the active role.  
 
 Performed failover, after standby joined the cluster successfully. But the 
standby node could not take the active role and entire *CLUSTER RESET* has 
happened, as the cluster is not having active role.
 
 On the active controller ::
 
 May 25 11:18:03 CONTROLLER-1 osafimmnd[2281]: NO SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
May 25 11:18:03 CONTROLLER-1 osafamfd[2342]: NO Received node_up from 2020f: 
msg_id 1
May 25 11:18:04 CONTROLLER-1 osafamfd[2342]: NO Node 'SC-2' joined the cluster
9May 25 11:18:04 CONTROLLER-1 osafimmnd[2281]: NO Implementer connected: 19 
(MsgQueueService131599) <0, 2020f>
May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Peer up on node 0x2020f
May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Got peer info request from node 
0x2020f with role STANDBY
May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Got peer info response from 
node 0x2020f with role STANDBY
May 25 11:18:04 CONTROLLER-1 osafimmnd[2281]: NO Implementer (applier) 
connected: 20 (@safAmfService2020f) <0, 2020f>
May 25 11:18:05 CONTROLLER-1 osafimmnd[2281]: NO Implementer (applier) 
connected: 21 (@OpenSafImmReplicatorB) <0, 2020f>

May 25 11:18:05 CONTROLLER-1 osafamfnd[2353]: NO 
'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'



On the standby controller ::

May 25 11:18:04 CONTROLLER-2 osafrded[4212]: NO Got peer info response from 
node 0x2010f with role ACTIVE
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN AMF HA STANDBY request
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
564114611150864
May 25 11:18:04 CONTROLLER-2 osafamfnd[4292]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
565214191280144
May 25 11:18:04 CONTROLLER-2 opensafd: OpenSAF(5.0.0 - ) services successfully 
started
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
567412731609092
May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 
566316589850628
                                                                                
                                                                                
                                          done
CONTROLLER-2:~ # May 25 11:18:04 CONTROLLER-2 osafimmnd[4242]: NO Implementer 
(applier) connected: 20 (@safAmfService2020f) <139, 2020f>
May 25 11:18:04 CONTROLLER-2 osafimmnd[4242]: NO Implementer (applier) 
connected: 21 (@OpenSafImmReplicatorB) <147, 2020f>
May 25 11:18:04 CONTROLLER-2 osafntfimcnd[4446]: NO Started
May 25 11:18:05 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 12 
<0, 2010f> (safCheckPointService)
May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: NO Node Down event for node id 
2010f:
May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: NO Current role: STANDBY
May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131599, 
SupervisionTime = 60
May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200249] TIPC: Resetting link 
<1.1.2:eth3-1.1.1:eth0>, peer not responding
May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200263] TIPC: Lost link 
<1.1.2:eth3-1.1.1:eth0> on network plane A
May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200272] TIPC: Lost contact with 
<1.1.1>
May 25 11:18:10 CONTROLLER-2 osafrded[4212]: NO Peer down on node 0x2010f
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: WA IMMD lost contact with peer 
IMMD (NCSMDS_RED_DOWN)
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: IN Resend of fevs message 52769, 
will not mbcp to peer IMMD
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA DISCARD DUPLICATE FEVS 
message:52769
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA Error code 2 returned for 
message type 82 - ignoring
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: IN Resend of fevs message 52770, 
will not mbcp to peer IMMD
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA DISCARD DUPLICATE FEVS 
message:52770
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA Error code 2 returned for 
message type 82 - ignoring
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: WA IMMND DOWN on active controller 
1 detected at standby immd!! 2. Possible failover
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: NO Skipping re-send of fevs 
message 52769 since it has recently been resent.
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: NO Skipping re-send of fevs 
message 52770 since it has recently been resent.
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Global discard node received 
for nodeId:2010f pid:2281
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 13 
<0, 2010f(down)> (OpenSafImmPBE)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 10 
<0, 2010f(down)> (safSmfService)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 9 <0, 
2010f(down)> (safLckService)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 8 <0, 
2010f(down)> (safEvtService)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 7 <0, 
2010f(down)> (safMsgGrpService)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 6 <0, 
2010f(down)> (MsgQueueService131343)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 5 <0, 
2010f(down)> (safAmfService)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 4 <0, 
2010f(down)> (safClmService)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 3 <0, 
2010f(down)> (@OpenSafImmReplicatorA)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 2 <0, 
2010f(down)> (@safLogService_appl)
May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 1 <0, 
2010f(down)> (safLogService)
May 25 11:18:10 CONTROLLER-2 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: NO Controller Failover: Setting 
role to ACTIVE
May 25 11:18:10 CONTROLLER-2 osafrded[4212]: NO RDE role set to ACTIVE
May 25 11:18:10 CONTROLLER-2 osafrded[4212]: NO Running 
'/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s)
May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: NO ACTIVE request
May 25 11:18:10 CONTROLLER-2 osaflogd[4252]: NO ACTIVE request
May 25 11:18:10 CONTROLLER-2 osafntfd[4262]: NO ACTIVE request
May 25 11:18:10 CONTROLLER-2 osafclmd[4272]: NO ACTIVE request
May 25 11:18:10 CONTROLLER-2 osafamfd[4282]: NO FAILOVER StandBy --> Active
May 25 11:18:10 CONTROLLER-2 osafamfd[4282]: ER FAILOVER StandBy --> Active 
FAILED, Standby OUT OF SYNC
May 25 11:18:10 CONTROLLER-2 osafamfd[4282]: Rebooting OpenSAF NodeId = 0 EE 
Name = No EE Mapped, Reason: FAILOVER failed, OwnNodeId = 131599, 
SupervisionTime = 60

  
  Here if cold sync is happening in background, this means opensafd on standby 
is not completely UP. Opensafd successful start on standby is giving a false 
claim to user.


---

** [tickets:#1842] rde: standby amfd notifies to NID early.**

**Status:** invalid
**Milestone:** never
**Created:** Fri May 20, 2016 09:25 AM UTC by Praveen
**Last Updated:** Fri May 20, 2016 07:19 PM UTC
**Owner:** nobody


Rde API rda_get_role() gives quiesced role on other than active controller from 
5.0.
Since API gives quiesced role, AMFD notifies to NID in 
initialize_for_assginment() even before cold sync completion.This ledas to 
assignment of MW components even when still standby AMFD is undergoing cold 
sync. This repopens a fixed issue #1334. 
AMFD gets standby role through rde callback and then it again call 
initialize_for_assignment() and initializes its interfaces. Also need to 
remember rde callback does not come for quiesced role on spare controller.

This porblem coould be applicable to other direcotors also or atleast notifying 
to nid before getting the role on standy may need some investigation.

One possible solution could be to give callback for quiesced role also. In that 
case call to rda_get_role() along with initialize_for_assignmet() can be 
removed  and call initialize_for_assignment() only in rde callback.


Active AMFD:
May 20 14:51:26.012760 osafamfd [485:getenv.cc:0124] TR 
OSAF_AMF_MIN_CLUSTER_SIZE is not set; using default value 2
May 20 14:51:26.014155 osafamfd [485:role.cc:0176] >> 
initialize_for_assignment: ha_state = 1
May 20 14:51:26.014646 osafamfd [485:mds.cc:0108] >> avd_mds_init
May 20 14:51:26.026272 osafamfd [485:mds.cc:0136] TR vdest created
May 20 14:51:26.030009 osafamfd [485:mds.cc:0160] TR mds install vdest

Standby AMFD:
May 20 14:51:32.911753 osafamfd [481:getenv.cc:0124] TR 
OSAF_AMF_MIN_CLUSTER_SIZE is not set; using default value 2
May 20 14:51:32.916912 osafamfd [481:role.cc:0176] >> 
initialize_for_assignment: ha_state = 3
May 20 14:51:32.921865 osafamfd [481:role.cc:0243] << 
initialize_for_assignment: rc = 1
May 20 14:51:32.921912 osafamfd [481:main.cc:0587] << initialize
May 20 14:51:40.391058 osafamfd [481:main.cc:0456] >> rda_cb
May 20 14:51:40.391266 osafamfd [481:main.cc:0478] << rda_cb
May 20 14:51:40.392132 osafamfd [481:main.cc:0757] >> process_event: 
evt->rcv_evt 23
May 20 14:51:40.392200 osafamfd [481:role.cc:0078] >> avd_role_change_evh: 
cause=1, role=2, current_role=3
May 20 14:51:40.392230 osafamfd [481:role.cc:0176] >> 
initialize_for_assignment: ha_state = 2
May 20 14:51:40.392256 osafamfd [481:mds.cc:0108] >> avd_mds_init

Spare AMFD:
May 20 14:51:33.030099 osafamfd [482:getenv.cc:0124] TR 
OSAF_AMF_MIN_CLUSTER_SIZE is not set; using default value 2
May 20 14:51:33.030828 osafamfd [482:role.cc:0176] >> 
initialize_for_assignment: ha_state = 3
May 20 14:51:33.031877 osafamfd [482:role.cc:0243] << 
initialize_for_assignment: rc = 1
May 20 14:51:33.031877 osafamfd [482:main.cc:0587] << initialize
~







---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to