- **status**: review --> fixed
- **Comment**:
commit 966a92979f8d00c763db40859a6b76018740a21d (HEAD -> develop,
origin/develop, ticket-3136)
Author: thuan.tran <[email protected]>
Date: Mon Dec 30 17:09:09 2019 +0700
amf: allow update node failover state in cold sync [#3136]
Nodes joined during cold sync is not updated failover state
to standby amfd cause later standby amfd failover to active
will mistakenly order reboot these nodes.
---
** [tickets:#3136] amf: incorrect node failover state on standby amfd**
**Status:** fixed
**Milestone:** 5.20.01
**Created:** Mon Dec 30, 2019 10:05 AM UTC by Thuan Tran
**Last Updated:** Mon Dec 30, 2019 10:21 AM UTC
**Owner:** Thuan Tran
Reboot Standby SC and some PLs may lead to incorrect node failover state on
standby amfd.
Because PLs joined during cold sync and standby amfd drop checkpoint data of
node failover state.
~~~
2019-12-19T04:35:21.374+01:00 SC-1 osafamfd[21833]: NO Received node_up from
2130f: msg_id 1
2019-12-19T04:35:21.374+01:00 SC-1 osafamfd[21833]: NO Node 'PL-19' joined the
cluster
2019-12-19T04:35:21.396+01:00 SC-1 osafamfd[21833]: NO Received node_up from
2150f: msg_id 1
2019-12-19T04:35:21.397+01:00 SC-1 osafamfd[21833]: NO Node 'PL-21' joined the
cluster
2019-12-19T04:35:21.416+01:00 SC-1 osafamfd[21833]: NO Received node_up from
20e0f: msg_id 1
2019-12-19T04:35:21.416+01:00 SC-1 osafamfd[21833]: NO Node 'PL-14' joined the
cluster
2019-12-19T04:35:21.375+01:00 SC-2 osafamfd[21809]: WA
avsv_validate_reo_type_in_csync: unknown type 53
2019-12-19T04:35:21.398+01:00 SC-2 osafamfd[21809]: WA
avsv_validate_reo_type_in_csync: unknown type 53
2019-12-19T04:35:21.425+01:00 SC-2 osafamfd[21809]: WA
avsv_validate_reo_type_in_csync: unknown type 53
2019-12-19T04:35:22.545+01:00 SC-2 osafamfd[21809]: NO Cold sync complete!
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-14' has
reappeared after network separation
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-19' has
reappeared after network separation
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T04:38:20.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-21' has
reappeared after network separation
..... these messages keep repeat .....
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-14' has
reappeared after network separation
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-19' has
reappeared after network separation
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: NO Node failover timeout
2019-12-19T05:08:21.425+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-21' has
reappeared after network separation
~~~
When Standby amfd failover become Active, amfd will order reboot these PLs
unexpectedly.
~~~
2019-12-19T05:17:25.626+01:00 SC-2 osafamfd[21809]: NO FAILOVER StandBy -->
Active
2019-12-19T05:17:25.640+01:00 SC-2 osafamfd[21809]: NO Failing over OpenSAF
components only
2019-12-19T05:17:25.642+01:00 SC-2 osafamfd[21809]: NO FAILOVER StandBy -->
Active DONE!
...
2019-12-19T05:20:21.825+01:00 SC-2 osafamfd[21809]: WA Failed node 'PL-14' has
reappeared after network separation
2019-12-19T05:20:21.825+01:00 SC-2 osafamfd[21809]: WA Sending node reboot order
~~~
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets