Another failure in recovery after SC absence is seen with attached model. The 
model is 2N application, 5 SUs hosted in each nodes in a 5-nodes cluster.
Initially, SU1 (in SC1) and SU2 (in SC2) have active and standby assignment. 
Abruptly stop SC1 and SC2, SU3 (PL-3) appears to have standby assignment.
When SC comes back, amfd reads assignment from IMM and from amfnd in PL3:

amfd receives SU3's assignment as sync info sent by amfnd in PL3
~~~
Apr 28 10:35:25.174806 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.174839 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.174873 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwon,safApp=AmfDemoTwon state=2
~~~

amfd reads from IMM, SU1 and SU2 are still having active and standby 
assignments, SU3 has no assignment.
~~~
Apr 28 10:35:25.176649 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.176814 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.176984 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwon,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.177222 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwon,safApp=AmfDemoTwon state=1
Apr 28 10:35:25.177413 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=1
Apr 28 10:35:25.177608 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> 
avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon 
safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=1
~~~

When amfd performs recovery, there are 2 SUs having standby assignment (SU2, 
SU3) and SU1 has active assignment. This state of 2N assignment is not valid, 
and SG Fsm node_fail() could not act as a recovery method. Only SU2 is failed 
over, SU1 still has absent assignment, thus its readiness state is still 
OUT_OF_SERVICE

~~~
Apr 28 10:35:37.392562 osafamfd [474:474:src/amf/amfd/sg_2n_fsm.cc:3379] >> 
node_fail: 'safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 0
~~~

The problem seems to be at avd_create_susi_in_imm() and 
avd_delete_siassignment_from_imm(), which have creation and deletion susi 
assignment are queued up thus could not perform immediately.
The result is IMM object and assignment object in amfnd being far from 
consistency.

Log and traces are attached for more details.


Attachments:

- 
[app3_twon5su3si.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/678f47cb/f83b/attachment/app3_twon5su3si.xml)
 (18.0 kB; text/xml)
- 
[log_trace.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/678f47cb/f83b/attachment/log_trace.tgz)
 (1.5 MB; application/x-compressed)


---

** [tickets:#2416] amfnd: su_si assignment message could be processed during SC 
absence stages**

**Status:** accepted
**Milestone:** 5.1.1
**Created:** Mon Apr 10, 2017 04:39 AM UTC by Minh Hon Chau
**Last Updated:** Mon Apr 10, 2017 06:58 AM UTC
**Owner:** Minh Hon Chau


In configuration of 2N application which has active SU hosted in controller and 
the other standby SU is hosted in payload, the event of stopping both SCs could 
generate a su_si assignment message towards standby SU to change HA state to 
active. 

- In case this su_si assignment message is buffered and comes before 
MDSNCS_DOWN, node is rebooted
- In other cases where MDSNCS_DOWN comes before su_si assignment, currently 
amfnd does not ignore this su_si assignment. amfnd should ignore this su_si 
assignment message as similiar to other messages like su_pres, su_reg


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to