The problem appears to be from:
Sep 1 13:11:41 PL-9 osafamfnd[11503]: NO Found and resend buffered oper_state
msg for SU:'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC', su_oper_state:'1',
node_oper_state:'1', recovery:'0'
Sep 1 13:11:41 PL-9 osafamfnd[11503]: NO Found and resend buffered oper_state
msg for SU:'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC', su_oper_state:'2',
node_oper_state:'1', recovery:'11'
When su_oper_state:'1" is sent, amfd thought this su becomes ENABLED, but this
is actually from before SC absence. As result, a su_presence_msg is sent to
amfnd to initiate the SU, which should not happen.
Another problem is, there are 2 auto repair request sent from amfd
Sep 1 13:11:41 PL-9 osafamfnd[11503]: NO Repair request for
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:42 PL-9 osafamfnd[11503]: NO Repair request for
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
The first repair came from su_try_repair() in cluster.cc, this should be deleted
The second repair came from the buffered su_oper_state:'2'
---
** [tickets:#2569] amfnd: coredump after sc absence recovery**
**Status:** assigned
**Milestone:** 5.17.10
**Created:** Sun Sep 03, 2017 05:53 AM UTC by Minh Hon Chau
**Last Updated:** Sun Sep 03, 2017 05:53 AM UTC
**Owner:** Minh Hon Chau
Configuration: 2N app, SGAutoRepair enables
Scenario:
- Su failover triggered
Sep 1 12:59:00 PL-9 osafamfnd[11503]: NO
'safComp=comp1,safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' faulted due to
'avaDown' : Recovery is 'suFailover'
Sep 1 12:59:00 PL-9 osafamfnd[11503]: NO Terminated all components in
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 12:59:00 PL-9 osafamfnd[11503]: NO Informing director of sufailover
Sep 1 12:59:01 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State UNINSTANTIATED =>
INSTANTIATING
Sep 1 12:59:01 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State INSTANTIATING =>
INSTANTIATED
- Both SCs go down
Sep 1 12:59:02 PL-9 osafamfnd[11503]: WA AMF director unexpectedly crashed
Sep 1 12:59:02 PL-9 osafamfnd[11503]: NO Found not-ack oper_state msg for
SU:'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC', su_oper_state:'1',
node_oper_state:'1', recovery:'0'
- Su failover trigger again
Sep 1 13:09:01 PL-9 osafamfnd[11503]: NO
'safComp=comp1,safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' faulted due to
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Sep 1 13:09:01 PL-9 osafamfnd[11503]: NO Terminated all components in
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:09:01 PL-9 osafamfnd[11503]: NO Informing director of sufailover
Sep 1 13:09:01 PL-9 osafamfnd[11503]: NO avnd_di_oper_send() deferred as AMF
director is offline(1), or sync is required(1)
- After SC restarts with 2 additional su failovers, amfnd get coredump
Sep 1 13:11:41 PL-9 osafamfnd[11503]: NO Found and resend buffered oper_state
msg for SU:'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC', su_oper_state:'1',
node_oper_state:'1', recovery:'0'
Sep 1 13:11:41 PL-9 osafamfnd[11503]: NO Found and resend buffered oper_state
msg for SU:'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC', su_oper_state:'2',
node_oper_state:'1', recovery:'11'
Sep 1 13:11:41 PL-9 osafamfnd[11503]: NO Repair request for
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:41 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State UNINSTANTIATED =>
UNINSTANTIATED
Sep 1 13:11:42 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State UNINSTANTIATED =>
INSTANTIATING
Sep 1 13:11:42 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State INSTANTIATING =>
INSTANTIATED
Sep 1 13:11:42 PL-9 osafamfnd[11503]: NO Repair request for
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:42 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State INSTANTIATED =>
UNINSTANTIATED
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State UNINSTANTIATED =>
INSTANTIATING
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO SU failover probation timer started
(timeout: 1000000000 ns)
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO saAmfSUFailover is true for
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO Performing failover of
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' (SU failover count: 2)
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safComp=comp1,safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' recovery action
escalated from 'componentFailover' to 'suFailover'
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safComp=comp1,safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' faulted due to
'avaDown' : Recovery is 'suFailover'
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State INSTANTIATING =>
INSTANTIATED
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO saAmfSUFailover is true for
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO Performing failover of
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' (SU failover count: 3)
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safComp=comp1,safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' recovery action
escalated from 'componentFailover' to 'suFailover'
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safComp=comp1,safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' faulted due to
'avaDown' : Recovery is 'suFailover'
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO Terminating components of
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'(abruptly & unordered)
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State INSTANTIATED =>
TERMINATING
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State TERMINATING =>
TERMINATING
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State TERMINATING =>
UNINSTANTIATED
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO Terminated all components in
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:44 PL-9 osafamfnd[11503]: NO Informing director of sufailover
Sep 1 13:11:45 PL-9 osafamfnd[11503]: NO SU failover probation timer expired
Sep 1 13:11:45 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State UNINSTANTIATED =>
INSTANTIATING
Sep 1 13:11:45 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State INSTANTIATING =>
INSTANTIATED
Sep 1 13:11:47 PL-9 osafamfnd[11503]: NO Assigning
'safSi=sv.ABC-2N-1,safApp=App-sv.ABC' ACTIVE to
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:47 PL-9 osafamfnd[11503]: NO Repair request for
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:47 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State INSTANTIATED =>
UNINSTANTIATED
Sep 1 13:11:48 PL-9 osafamfnd[11503]: NO
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC' Presence State UNINSTANTIATED =>
INSTANTIATING
Sep 1 13:11:48 PL-9 osafamfnd[11503]: NO Assigned
'safSi=sv.ABC-2N-1,safApp=App-sv.ABC' ACTIVE to
'safSu=b2a6d17f2b,safSg=2N,safApp=App-sv.ABC'
Sep 1 13:11:48 PL-9 osafamfnd[11503]: ../../opensaf/src/amf/amfnd/di.cc:858:
avnd_di_susi_resp_send: Assertion 'm_AVND_SU_IS_ASSIGN_PEND(su)' failed.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets