- **status**: assigned --> unassigned
- **assigned_to**: Minh Hon Chau -->  nobody 



---

** [tickets:#2588] amfd: Command unlock nodegroup timeout if su failover is 
escalated (inappropriate counter update)**

**Status:** unassigned
**Milestone:** 5.17.10
**Created:** Thu Sep 21, 2017 07:41 AM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 21, 2017 09:56 PM UTC
**Owner:** nobody


During unlock nodegroup operation, if a su has active assignment to fail over, 
amfd will send assingment susi message to move ha state from standby assignment 
of the other su to active. This susi message is sent but the su_cnt_admin_oper 
is not increased. However, when amfd receives assignment response, amfd 
decreases this su_cnt_admin_oper. At the end, the su_cnt_admin_oper is ~0, thus 
the admin operation timeout.

Below is outlined amfd trace

Su failover

*Sep 11  7:51:09.337401 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg2nfsm.cc:3379] >> nodefail: 
'safSu=PL-3,safSg=2N,safApp=ABC-sv.SCD', 0
Sep 11  7:51:09.337576 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:2443] >> avdsgsusimodsnd: 
'safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD', state 1
Sep 11  7:51:09.337580 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/su.cc:2601] >> anysusifsmin: 
SU:'safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD', checkfsm:1
Sep 11  7:51:09.337583 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/su.cc:2606] TR 
SUSI:'safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD,safSi=sv.SCD-2N-1,safApp=ABC-sv.SCD',
 fsm:'3'*

susi msg is sent but su_cnt_admin_oper is not increased

*Sep 11  7:51:09.338296 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/util.cc:0693] >> avdsndsusimsg 
Sep 11  7:51:09.338310 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/util.cc:0967] TR Sending 15 to 2040f
...
Sep 11  7:51:09.338457 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/util.cc:0978] << avdsndsusimsg *

receive susi response, but su_cnt_admin_oper is decreased, it's **0**. But amfd 
still waits for another susi response

*Sep 11  7:51:09.359295 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:1172] >> avdsusiassignevh: 
id:745, node:2040f, act:5, 'safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD', '', ha:1, 
err:1, single:0
Sep 11  7:51:09.359600 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg2nfsm.cc:2433] >> susisuccess: 
'safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD' act=5, hastate=1, sgfsmstate=1
Sep 11  7:51:09.359605 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg2nfsm.cc:1498] >> susisuccesssgrealign: 
'safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD' act=5, state=1
Sep 11  7:51:09.359609 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg.cc:1847] TR 
safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD found in safSg=2N,safApp=ABC-sv.SCD
Sep 11  7:51:09.359615 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/siass.cc:0120] TR HA State ACTIVE of 
safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD for safSi=sv.SCD-2N-1,safApp=ABC-sv.SCD
...
Sep 11  7:51:09.362551 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg2nfsm.cc:2648] << susisuccess: rc:1
Sep 11  7:51:09.362585 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:1682] TR 
node:'safAmfNode=PL-4,safAmfCluster=myAmfCluster', sucntadminoper:0
Sep 11  7:51:09.362594 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0359] >> 
processsusiresponseforng: 'safSu=PL-4,safSg=2N,safApp=ABC-sv.SCD'
Sep 11  7:51:09.362599 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0430] TR nodeoperlist size:2
Sep 11  7:51:09.362603 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0458] << 
processsusiresponseforng* 

another susi response comes, the su_cnt_admin_oper decreases as normall, but 
its value now **~0**. PL-4 is removed out of the list

*Sep 11  7:51:09.670160 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:1172] >> avdsusiassignevh: 
id:748, node:2040f, act:2, 'safSu=PL-4,safSg=NWayActive,safApp=ABC-ORE', 
'safSi=All-NWayActive,safApp=ABC-ORE', ha:1, err:1, single:0
Sep 11  7:51:09.670414 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgnwayactfsm.cc:0599] - >> susisuccess: 1
Sep 11  7:51:09.670518 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgnwayactfsm.cc:1088] << susisuccess 
Sep 11  7:51:09.670524 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:1682] TR 
node:'safAmfNode=PL-4,safAmfCluster=myAmfCluster', sucntadminoper:4294967295
Sep 11  7:51:09.670528 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0359] >> 
processsusiresponseforng: 'safSu=PL-4,safSg=NWayActive,safApp=ABC-ORE'
Sep 11  7:51:09.670532 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0458] << 
processsusiresponseforng* 

amf continues create another susi assignment, PL-4 is added back to list, 
**su_cnt_admin_oper increase by one, which is 0**

*Sep 11  7:51:14.534857 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/siass.cc:0475] >> avdsusicreate: 
safSu=PL-4,safSg=2N,safApp=ABC-min.minm safSi=min.minm-2N-1,safApp=ABC-min.minm 
state=2
Sep 11  7:51:14.536717 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/util.cc:0693] >> avdsndsusimsg 
Sep 11  7:51:14.536940 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0269] TR 
node:'safAmfNode=PL-4,safAmfCluster=myAmfCluster', sucntadminoper:0
Sep 11  7:51:14.536946 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0272] TR nodeoperlist size:2
Sep 11  7:51:14.537445 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg2nfsm.cc:2648] << susisuccess: rc:1
Sep 11  7:51:14.537453 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:1682] TR 
node:'safAmfNode=SC-1,safAmfCluster=myAmfCluster', sucntadminoper:0
Sep 11  7:51:14.537457 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0359] >> 
processsusiresponseforng: 'safSu=SC-1,safSg=2N,safApp=ABC-min.minm'
Sep 11  7:51:14.537462 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0430] TR nodeoperlist size:1
Sep 11  7:51:14.537465 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0458] << 
processsusiresponseforng* 

amf receives the last susi response, su_cnt_admin_oper decrease by 1, **which 
is ~0**. It's not zero, so PL-4 is not removed out of the list.

*Sep 11  7:51:14.544879 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:1172] >> avdsusiassignevh: 
id:754, node:2040f, act:2, 'safSu=PL-4,safSg=2N,safApp=ABC-min.minm', 
'safSi=min.minm-2N-1,safApp=ABC-min.minm', ha:2, err:1, single:0
Sep 11  7:51:14.545179 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg2nfsm.cc:2433] >> susisuccess: 
'safSu=PL-4,safSg=2N,safApp=ABC-min.minm' act=2, hastate=2, TEST sgfsmstate=1
Sep 11  7:51:14.545183 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sg2nfsm.cc:1498] >> susisuccesssgrealign: 
'safSu=PL-4,safSg=2N,safApp=ABC-min.minm' act=2, state=2
Sep 11  7:51:14.545994 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:1682] TR 
node:'safAmfNode=PL-4,safAmfCluster=myAmfCluster', sucntadminoper:4294967295
Sep 11  7:51:14.545998 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0359] >> 
processsusiresponseforng: 'safSu=PL-4,safSg=2N,safApp=ABC-min.minm'
Sep 11  7:51:14.546002 osafamfd 
[8907:8907:../../opensaf/src/amf/amfd/sgproc.cc:0458] << 
processsusiresponseforng* 


A similiar issue of unlock nodegroup operation, but different cause.
.#2526 amfd: Command unlock nodegroup timeout if su failover is escalated (>= 
2SIs)



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to