[tickets] [opensaf:tickets] #3136 amf: incorrect node failover state on standby amfd

Thuan Tran via Opensaf-tickets Tue, 31 Dec 2019 01:28:59 -0800

- **status**: review --> fixed
- **Comment**:

commit 966a92979f8d00c763db40859a6b76018740a21d (HEAD -> develop, 
origin/develop, ticket-3136)
Author: thuan.tran <[email protected]>
Date:   Mon Dec 30 17:09:09 2019 +0700


    amf: allow update node failover state in cold sync [#3136]
    
    Nodes joined during cold sync is not updated failover state
    to standby amfd cause later standby amfd failover to active
    will mistakenly order reboot these nodes.





---

** [tickets:#3136] amf: incorrect node failover state on standby amfd**

**Status:** fixed
**Milestone:** 5.20.01
**Created:** Mon Dec 30, 2019 10:05 AM UTC by Thuan Tran
**Last Updated:** Mon Dec 30, 2019 10:21 AM UTC
**Owner:** Thuan Tran


Reboot Standby SC and some PLs may lead to incorrect node failover state on 
standby amfd.
Because PLs joined during cold sync and standby amfd drop checkpoint data of 
node failover state.
~~~
2019-12-19T04:35:21.374+01:00 SC-1 osafamfd[21833]: NO Received node_up from 
2130f: msg_id 1
2019-12-19T04:35:21.374+01:00 SC-1 osafamfd[21833]: NO Node 'PL-19' joined the 
cluster
2019-12-19T04:35:21.396+01:00 SC-1 osafamfd[21833]: NO Received node_up from 
2150f: msg_id 1
2019-12-19T04:35:21.397+01:00 SC-1 osafamfd[21833]: NO Node 'PL-21' joined the 
cluster
2019-12-19T04:35:21.416+01:00 SC-1 osafamfd[21833]: NO Received node_up from 
20e0f: msg_id 1
2019-12-19T04:35:21.416+01:00 SC-1 osafamfd[21833]: NO Node 'PL-14' joined the 
cluster

2019-12-19T04:35:21.375+01:00 SC-2 osafamfd[21809]: WA 
avsv_validate_reo_type_in_csync: unknown type 53
2019-12-19T04:35:21.398+01:00 SC-2 osafamfd[21809]: WA 
avsv_validate_reo_type_in_csync: unknown type 53
2019-12-19T04:35:21.425+01:00 SC-2 osafamfd[21809]: WA 
avsv_validate_reo_type_in_csync: unknown type 53
2019-12-19T04:35:22.545+01:00 SC-2 osafamfd[21809]: NO Cold sync complete!


2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-14' has 
reappeared after network separation
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-19' has 
reappeared after network separation
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-21' has 
reappeared after network separation
..... these messages keep repeat .....
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-14' has 
reappeared after network separation
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-19' has 
reappeared after network separation
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-21' has 
reappeared after network separation
~~~
When Standby amfd failover become Active, amfd will order reboot these PLs 
unexpectedly.
~~~
2019-12-19T05:17:25.626+01:00 SC-2 osafamfd[21809]: NO FAILOVER StandBy --> 
Active
2019-12-19T05:17:25.640+01:00 SC-2 osafamfd[21809]: NO Failing over OpenSAF 
components only
2019-12-19T05:17:25.642+01:00 SC-2 osafamfd[21809]: NO FAILOVER StandBy --> 
Active DONE!
...
2019-12-19T05:20:21.825+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-14' has 
reappeared after network separation
2019-12-19T05:20:21.825+01:00 SC-2 osafamfd[21809]: WA Sending node reboot order
~~~



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #3136 amf: incorrect node failover state on standby amfd

Reply via email to