Hi Ritu,
 I have analysed this issue. The problem is because SMF tries to call 
saClmClusterNodeGet() when it gets standby assignment. API call fails as it is 
a non-member node. This problem was identified already while fixing #1781 and 
an enhancement ticket was raised in SMF #1791 "smf: use CLM cluster tracking 
instead of reading per node up for SMFND". Since MW assignments are not 
affected on CLM locked node, AMF giving fresh standby role seems to be 
justified. Problem will get fixed when SMF ticket #1791 is implemented.
   However this AMF ticket can be used for one purpose. In failover situation, 
AMF will  change standby controller to active controller and then it will 
choose a spare controller for fresh standby assignments. What I am observing 
is: if multiple spare controllers are available then also AMF is chosing CLM 
locked spare controller for fresh standby role. If available, AMF must choose 
CLM unlocked spare controller for fresh standby assignments. This will keep 
alive possibiltiy of controller role swap with si-swap admin op.
   Please change the title of the ticket to "amf: choose CLM unlocked spare 
controller for standby role in failover situation." 
   Thanks,
   Praveen
   



---

** [tickets:#2387] clm_locked spare controller got standby role after failover**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Fri Mar 17, 2017 12:13 PM UTC by Ritu Raj
**Last Updated:** Mon Mar 20, 2017 04:47 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-1.tar.bz2)
 (873.4 kB; application/x-bzip)
- 
[SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-2.tar.bz2)
 (762.0 kB; application/x-bzip)
- 
[SC-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2387/attachment/SC-3.tar.bz2)
 (724.5 kB; application/x-bzip)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
6 nodes setup(3 controller and 3 payload,  with SC_ABSENCE enabled)

###Summary
clm_locked spare controller got standby role after failover

###Steps followed & Observed behaviour
1. Initially SC-1 (ACTIVE), SC-2 (QUIESCED) , SC-3 (STANDBY) role
2. Performed clm_lock operation on SC-2(QUIESCED) controller
3. after, that perfomed on failover on Active controller (SC-1), by killing one 
director
4.  Observed that SC-3 got Active  role  while SC-2 got Standby role, which is 
not expcted as node SC-2 is in clm_locked state 
5.  Later, SC-1 joined  as QUIESCED controller (after recovery from failover)

**Expected**:
clm_lock node should not get standby role as it is in locked state and SC-1 
should join as a  Standby after recovery from failover.
   
 Syslog:
Mar 17 17:56:59 suseR2-S2 osafimmnd[21809]: NO Implementer (applier) connected: 
28 (@safSmf_applier1) <0, 2030f>
Mar 17 17:56:59 suseR2-S2 osafamfnd[21859]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO RDE role set to STANDBY
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Peer up on node 0x2030f
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info request from node 
0x2030f with role ACTIVE
Mar 17 17:56:59 suseR2-S2 osafrded[21779]: NO Got peer info response from node 
0x2030f with role ACTIVE
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:3, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:5, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 24 
(change:5, dest:13)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 
(change:3, dest:566317113647120)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: NO MDS event from svc_id 25 
(change:3, dest:565213543063568)
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN AMF HA STANDBY request
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 
566317113647120
Mar 17 17:56:59 suseR2-S2 osafimmd[21798]: IN Added IMMND node with dest 
565213543063568
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, 
rc=SA_AIS_ERR_UNAVAILABLE (31)
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA saClmClusterNodeGet failed, 
rc=SA_AIS_ERR_UNAVAILABLE (31)
Mar 17 17:56:59 suseR2-S2 osafsmfd[21878]: WA proc_mds_info: SMFND UP failed


 From Traces:
 
 SC-2 left the cluster as clm lock operation performed and later SC-1 left the 
cluster as one failover performed:
 
~~~
SC-2:::
 Mar 17 17:54:24.123134 osafamfnd [6773:src/amf/amfnd/clm.cc:0196] >> 
clm_track_cb: '0' '4' '1'
Mar 17 17:54:24.123142 osafamfnd [6773:src/amf/amfnd/clm.cc:0217] TR Node has 
left the cluster 'safNode=SC-2,safCluster=myClmCluster', avnd_cb->first_time_up 
0,notifItem->clusterNode.nodeId 131599, avnd_cb->node_info.nodeId 131343
-----
-----
SC-1:::
 Mar 17 17:57:03.514477 osafamfnd [9266:src/amf/amfnd/clm.cc:0196] >> 
clm_track_cb: '0' '4' '1'
Mar 17 17:57:03.514484 osafamfnd [9266:src/amf/amfnd/clm.cc:0217] TR Node has 
left the cluster 'safNode=SC-1,safCluster=myClmCluster', avnd_cb->first_time_up 
0,notifItem->clusterNode.nodeId 131343, avnd_cb->node_info.nodeId 131855
~~~

 after failover SC-2 got standby role and SC-3 Active :
~~~
SC::2
 Mar 17 17:56:59.941081 osafamfnd [21859:src/amf/amfnd/susm.cc:1043] NO 
Assigned 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar 17 17:56:59.941089 osafamfnd [21859:src/amf/amfnd/err.cc:1639] >> 
is_no_assignment_due_to_escalations
Mar 17 17:56:59.941097 osafamfnd [21859:src/amf/amfnd/err.cc:1651] << 
is_no_assignment_due_to_escalations: false
Mar 17 17:56:59.941104 osafamfnd [21859:src/amf/amfnd/di.cc:0829] >> 
avnd_di_susi_resp_send: Sending Resp su=safSu=SC-2,safSg=2N,safApp=OpenSAF, 
si=safSi=SC-2N,safApp=OpenSAF, curr_state=2, prv_state=0
Mar 17 17:56:59.941112 osafamfnd [21859:src/amf/amfnd/di.cc:0839] TR 
curr_assign_state '3
----
----

SC:::3
Mar 17 17:57:03.656105 osafamfnd [9266:src/amf/amfnd/susm.cc:1043] NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-3,safSg=2N,safApp=OpenSAF'
Mar 17 17:57:03.656113 osafamfnd [9266:src/amf/amfnd/err.cc:1639] >> 
is_no_assignment_due_to_escalations
Mar 17 17:57:03.656120 osafamfnd [9266:src/amf/amfnd/err.cc:1651] << 
is_no_assignment_due_to_escalations: false
~~~
 
 Notes:
 1. Syslog attached
 2. amfd and amfnd traces of active, standby and spare controller attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to