- **status**: review --> fixed
- **Comment**:

changeset:   6992:19d0f063ac2f
tag:         tip
parent:      6989:07bcccab7961
user:        [email protected]
date:        Mon Oct 12 17:12:24 2015 +0530
summary:     amfd: mark NG locked if controller failovers during shutdown op on 
NG [#1513]

changeset:   6991:b790b507cfa0
branch:      opensaf-4.7.x
parent:      6987:60536d29c33f
user:        [email protected]
date:        Mon Oct 12 17:12:08 2015 +0530
summary:     amfd: mark NG locked if controller failovers during shutdown op on 
NG [#1513]

changeset:   6990:edc41730df45
branch:      opensaf-4.6.x
parent:      6983:b93950e02e3b
user:        [email protected]
date:        Mon Oct 12 17:11:28 2015 +0530
summary:     amfd: mark NG locked if controller failovers during shutdown op on 
NG [#1513]

https://sourceforge.net/p/opensaf/mailman/message/34527185/



---

** [tickets:#1513] amf: NG gets stuck in SHUTTING_DOWN state during shutdown op 
and controller failover.**

**Status:** fixed
**Milestone:** 4.6.1
**Created:** Mon Oct 05, 2015 09:47 AM UTC by Praveen
**Last Updated:** Mon Oct 05, 2015 10:48 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[osafamfd_new_active](https://sourceforge.net/p/opensaf/tickets/1513/attachment/osafamfd_new_active)
 (389.7 kB; application/octet-stream)
- 
[osafamfd_old_active](https://sourceforge.net/p/opensaf/tickets/1513/attachment/osafamfd_old_active)
 (78.3 kB; application/octet-stream)
- 
[nwayactive_demo.xml](https://sourceforge.net/p/opensaf/tickets/1513/attachment/nwayactive_demo.xml)
 (9.5 kB; text/xml)


It happens when controller failover occurs when nodegroup is in SHUTTING_DOWN 
state and one of its node hosts more than one application SUs with atleast one 
SU active. Remeber this issue does not occur when node hosts only one 
application SU. Handling for one SU case is already present. Also This issue is 
not speicific to red model.

Steps to reproduce:
1) Modify AMF demo to host both the SUs on standby cotroller, 
saAmfSIPrefActiveAssignments=2 for SI and change red model to nwayactive.
2) Unlock-in and unlock both the SUs.
3) Create a nodegroup such that it contains only one node: the standby 
controller.
4) Now shutdown the nodegroup. As part of this components in both the SUs will 
get quiescing state callback.
5) Respond for these callback after stopping opensaf on active contoller 
failover so that new active will now proceed with the operation.
6)AMF will send remove callnback and removal will be success. But it never 
marks NG to locked state.
7)Now unlock and deletion of nodegroup fails.

Analysis:
 During SHUTDOWN admin operation on NG, initial admin state is set to 
SHUTTING_DOWN and it is checkpointed to standby AMFD. On decoding it, standby 
AMFD sets node->admin_ng and it clears it when active AMFD checkpoints the 
LOCKED state.  Now after fail-over when AMFD gets quiescing success response 
from AMFND it clears this pointer in process_su_si_response_for_ng() assuming 
there is only one SU hosted on that node. After this when response for second 
SU comes, this response is not processed from NG perspective as AMFD has 
already cleared node->admin_ng. 


Attached are AMF traces and configuration to reproduce the problem.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to