Another failure in recovery after SC absence is seen with attached model. The
model is 2N application, 5 SUs hosted in each nodes in a 5-nodes cluster.
Initially, SU1 (in SC1) and SU2 (in SC2) have active and standby assignment.
Abruptly stop SC1 and SC2, SU3 (PL-3) appears to have standby assignment.
When SC comes back, amfd reads assignment from IMM and from amfnd in PL3:
amfd receives SU3's assignment as sync info sent by amfnd in PL3
~~~
Apr 28 10:35:25.174806 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.174839 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.174873 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwon,safApp=AmfDemoTwon state=2
~~~
amfd reads from IMM, SU1 and SU2 are still having active and standby
assignments, SU3 has no assignment.
~~~
Apr 28 10:35:25.176649 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.176814 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.176984 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwon,safApp=AmfDemoTwon state=2
Apr 28 10:35:25.177222 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwon,safApp=AmfDemoTwon state=1
Apr 28 10:35:25.177413 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=1
Apr 28 10:35:25.177608 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >>
avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon
safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=1
~~~
When amfd performs recovery, there are 2 SUs having standby assignment (SU2,
SU3) and SU1 has active assignment. This state of 2N assignment is not valid,
and SG Fsm node_fail() could not act as a recovery method. Only SU2 is failed
over, SU1 still has absent assignment, thus its readiness state is still
OUT_OF_SERVICE
~~~
Apr 28 10:35:37.392562 osafamfd [474:474:src/amf/amfd/sg_2n_fsm.cc:3379] >>
node_fail: 'safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 0
~~~
The problem seems to be at avd_create_susi_in_imm() and
avd_delete_siassignment_from_imm(), which have creation and deletion susi
assignment are queued up thus could not perform immediately.
The result is IMM object and assignment object in amfnd being far from
consistency.
Log and traces are attached for more details.
Attachments:
-
[app3_twon5su3si.xml](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/678f47cb/f83b/attachment/app3_twon5su3si.xml)
(18.0 kB; text/xml)
-
[log_trace.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/678f47cb/f83b/attachment/log_trace.tgz)
(1.5 MB; application/x-compressed)
---
** [tickets:#2416] amfnd: su_si assignment message could be processed during SC
absence stages**
**Status:** accepted
**Milestone:** 5.1.1
**Created:** Mon Apr 10, 2017 04:39 AM UTC by Minh Hon Chau
**Last Updated:** Mon Apr 10, 2017 06:58 AM UTC
**Owner:** Minh Hon Chau
In configuration of 2N application which has active SU hosted in controller and
the other standby SU is hosted in payload, the event of stopping both SCs could
generate a su_si assignment message towards standby SU to change HA state to
active.
- In case this su_si assignment message is buffered and comes before
MDSNCS_DOWN, node is rebooted
- In other cases where MDSNCS_DOWN comes before su_si assignment, currently
amfnd does not ignore this su_si assignment. amfnd should ignore this su_si
assignment message as similiar to other messages like su_pres, su_reg
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets