- **Type**: defect --> enhancement
---
** [tickets:#400] amfd crashed while locking SI in Nway model and cluster went
for reboot**
**Status:** unassigned
**Milestone:** future
**Created:** Fri May 31, 2013 05:23 AM UTC by Nagendra Kumar
**Last Updated:** Fri Aug 30, 2013 10:23 AM UTC
**Owner:** nobody
Migrated from http://devel.opensaf.org/ticket/2655
changeset: 3533
configuration: 1App, 1SG,6SIs(each has 1 csi), 8SUs(each 1 comp)
saAmfSGNumPrefInserviceSUs=8, saAmfSGNumPrefAssignedSUs=6,
saAmfSGMaxActiveSIsperSU=6
Dependency configured : 1Sponsor, 5 dependents
SI1 is sponsor and SI2-SI6 are dependents.
scenario:
before creating the dependency, lock the SIs. while locking an SI, amfd crashed
on active controller. Failover happens and amfd also crashes some times on
standby and the cluster goes for reboot.
gbd output:
--------------------------------------------------------------------------------
(gdb) bt
#0 0x00007fbad077c645 in raise () from /lib64/libc.so.6
#1 0x00007fbad077dc33 in abort () from /lib64/libc.so.6
#2 0x00007fbad1d8de15 in osafassert_fail (file=0x4aac45 "avd_su.c", line=1582,
func=0x4abbf0 "avd_su_dec_curr_stdby_si", assertion=0x4abc10
"su->saAmfSUNumCurrStandbySIs > 0")
at sysf_def.c:399
#3 0x000000000048815d in avd_su_dec_curr_stdby_si (su=0x782d50) at avd_su.c:1582
#4 0x0000000000489c5d in avd_susi_update_assignment_counters (susi=0x7cc260,
action=AVSV_SUSI_ACT_MOD,
current_ha_state=SA_AMF_HA_STANDBY, new_ha_state=SA_AMF_HA_ACTIVE) at
avd_siass.c:705
#5 0x0000000000489979 in avd_susi_mod_send (susi=0x7cc260,
ha_state=SA_AMF_HA_ACTIVE) at avd_siass.c:624
#6 0x00000000004715d5 in avd_sg_nway_susi_succ_sg_realign (cb=0x6bcb80,
su=0x781040, susi=0x7c1260,
act=AVSV_SUSI_ACT_DEL, state=SA_AMF_HA_ACTIVE) at avd_sgNWayfsm.c:2607
#7 0x000000000046a7ce in avd_sg_nway_susi_sucss_func (cb=0x6bcb80, su=0x781040,
susi=0x7c1260,
act=AVSV_SUSI_ACT_DEL, state=SA_AMF_HA_ACTIVE) at avd_sgNWayfsm.c:359
#8 0x0000000000477f18 in avd_su_si_assign_evh (cb=0x6bcb80, evt=0x7e61d0) at
avd_sgproc.c:1286
#9 0x000000000043af86 in avd_process_event (cb_now=0x6bcb80, evt=0x7e61d0) at
avd_proc.c:589
#10 0x000000000043ad0d in avd_main_proc () at avd_proc.c:505
#11 0x0000000000409210 in main (argc=2, argv=0x7fffda214ea8) at amfd_main.c:47
/var/log/messages on active ctrl:
--------------------------------------------------------------------------------
May 7 19:37:39 SLES-SLOT-1 osafamfnd[2820]: Assigned
'safSi=dummy_NWay_Active_1Norm_6,safApp=NwActApp?' ACTIVE to
'safSu=dummy_NWay_Active_1Norm_7,safSg=SG_dummy_nwayact,safApp=NwActApp?'
May 7 19:37:39 SLES-SLOT-1 osafamfd[2810]: SI lock of
safSi=dummy_NWay_Active_1Norm_1,safApp=NwActApp? failed, SG not stable
May 7 19:37:39 SLES-SLOT-1 osafamfd[2810]:
'safSi=dummy_NWay_Active_1Norm_1,safApp=NwActApp?' other semantics...
May 7 19:37:39 SLES-SLOT-1 osafimmnd[2759]: Ccb 326 COMMITTED
May 7 19:37:39 SLES-SLOT-1 osafimmnd[2759]: Ccb 327 COMMITTED
May 7 19:37:39 SLES-SLOT-1 osafimmnd[2759]: Ccb 328 COMMITTED
May 7 19:37:40 SLES-SLOT-1 osafimmnd[2759]: Ccb 329 COMMITTED
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removing
'safSi=dummy_NWay_1Norm_2,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removed
'safSi=dummy_NWay_1Norm_2,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removing
'safSi=dummy_NWay_1Norm_3,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removed
'safSi=dummy_NWay_1Norm_3,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removing
'safSi=dummy_NWay_1Norm_4,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removed
'safSi=dummy_NWay_1Norm_4,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafimmnd[2759]: Ccb 330 COMMITTED
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removing
'safSi=dummy_NWay_1Norm_5,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removed
'safSi=dummy_NWay_1Norm_5,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removing
'safSi=dummy_NWay_1Norm_6,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Removed
'safSi=dummy_NWay_1Norm_6,safApp=N' from
'safSu=dummy_NWay_1Norm_7,safSg=SG_dummy_n,safApp=N'
May 7 19:37:40 SLES-SLOT-1 osafimmnd[2759]: Ccb 331 COMMITTED
May 7 19:37:40 SLES-SLOT-1 osafimmnd[2759]: Ccb 332 COMMITTED
May 7 19:37:40 SLES-SLOT-1 osafamfd[2810]: avd_su.c:1582:
avd_su_dec_curr_stdby_si: Assertion 'su->saAmfSUNumCurrStandbySIs > 0' failed.
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: AMF director unexpectedly crashed
May 7 19:37:40 SLES-SLOT-1 osafamfnd[2820]: Rebooting OpenSAF NodeId? = 131343
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received
May 7 19:37:40 SLES-SLOT-1 osafimmnd[2759]: Implementer locally disconnected.
Marking it as doomed 4 <13, 2010f> (safAmfService)
/var/log/messages on standby ctrl:
--------------------------------------------------------------------------------
May 7 19:37:28 SLES-SLOT-2 osafimmd[2646]: Skipping re-send of fevs message
5719 since it has recently been resent.
May 7 19:37:28 SLES-SLOT-2 kernel: TIPC: Resetting link
<1.1.47:eth0-1.1.31:eth0>, peer not responding
May 7 19:37:28 SLES-SLOT-2 kernel: TIPC: Lost link <1.1.47:eth0-1.1.31:eth0> on
network plane A
May 7 19:37:28 SLES-SLOT-2 kernel: TIPC: Lost contact with <1.1.31>
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: DISCARD DUPLICATE FEVS message:5718
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Error code 2 returned for message
type 57 - ignoring
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: DISCARD DUPLICATE FEVS message:5719
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Error code 2 returned for message
type 57 - ignoring
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Global discard node received for
nodeId:2010f pid:2759
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 1 <0,
2010f(down)> (safLogService)
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 2 <0,
2010f(down)> (safClmService)
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 5 <0,
2010f(down)> (MsgQueueService?131343)
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 6 <0,
2010f(down)> (safEvtService)
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 7 <0,
2010f(down)> (safLckService)
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 8 <0,
2010f(down)> (safMsgGrpService)
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 9 <0,
2010f(down)> (safCheckPointService)
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 10 <0,
2010f(down)> (safSmfService)
May 7 19:37:28 SLES-SLOT-2 opensaf_reboot: Rebooting remote node in the absence
of PLM is outside the scope of OpenSAF
May 7 19:37:28 SLES-SLOT-2 osafrded[2628]: rde_rde_set_role: role set to 1
May 7 19:37:28 SLES-SLOT-2 osafimmd[2646]: ACTIVE request
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Director Service Is NEWACTIVE state
May 7 19:37:28 SLES-SLOT-2 osafimmd[2646]: Coord re-elected, resides at 2020f
May 7 19:37:28 SLES-SLOT-2 osafimmd[2646]: Received IMMD service event
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: This IMMND re-elected coord
redundantly, failover ?
May 7 19:37:28 SLES-SLOT-2 osaflogd[2774]: ACTIVE request
May 7 19:37:28 SLES-SLOT-2 osafntfd[2784]: ACTIVE request
May 7 19:37:28 SLES-SLOT-2 osafclmd[2794]: ACTIVE request
May 7 19:37:28 SLES-SLOT-2 osafimmd[2646]: Received IMMD service event
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 15
(safLogService) <2, 2020f>
May 7 19:37:28 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 16
(safClmService) <3, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafclmd[2794]: clms_mds_msg_send FAILED: 2
May 7 19:37:29 SLES-SLOT-2 osafclmd[2794]: clms_mds_msg_send FAILED: 2
May 7 19:37:29 SLES-SLOT-2 osafclmd[2794]: clms_mds_msg_send FAILED: 2
May 7 19:37:29 SLES-SLOT-2 osafclmd[2794]: clms_mds_msg_send FAILED: 2
May 7 19:37:29 SLES-SLOT-2 osafclmd[2794]: clms_mds_msg_send FAILED: 2
May 7 19:37:29 SLES-SLOT-2 osafclmd[2794]: clms_mds_msg_send FAILED: 2
May 7 19:37:28 SLES-SLOT-2 osafamfd[2813]: FAILOVER StandBy? —> Active
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 3 <8,
2020f> (@safAmfService2020f)
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 17
(safAmfService) <8, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafamfd[2813]: Node 'SC-1' left the cluster
May 7 19:37:29 SLES-SLOT-2 osafamfd[2813]: FAILOVER StandBy? —> Active DONE!
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 18
(safLckService) <298, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 19
(safCheckPointService) <300, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 20
(safEvtService) <299, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 21
(safMsgGrpService) <296, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafmsgnd[2912]: Deferred mqa event list head NULL
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 22
(MsgQueueService?131343) <427, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer locally disconnected.
Marking it as doomed 22 <427, 2020f> (MsgQueueService?131343)
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer disconnected 22 <427,
2020f> (MsgQueueService?131343)
May 7 19:37:29 SLES-SLOT-2 osafimmnd[2656]: Implementer connected: 23
(safSmfService) <292, 2020f>
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigning
'safSi=dummy_NWay_Active_1Norm_2,safApp=NwActApp?' ACTIVE to
'safSu=dummy_NWay_Active_1Norm_8,safSg=SG_dummy_nwayact,safApp=NwActApp?'
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigned
'safSi=dummy_NWay_Active_1Norm_2,safApp=NwActApp?' ACTIVE to
'safSu=dummy_NWay_Active_1Norm_8,safSg=SG_dummy_nwayact,safApp=NwActApp?'
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigning
'safSi=dummy_NWay_1Norm_2,safApp=N' STANDBY to
'safSu=dummy_NWay_1Norm_8,safSg=SG_dummy_n,safApp=N'
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigning
'safSi=dummy_NWay_1Norm_3,safApp=N' STANDBY to
'safSu=dummy_NWay_1Norm_8,safSg=SG_dummy_n,safApp=N'
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigned
'safSi=dummy_NWay_1Norm_2,safApp=N' STANDBY to
'safSu=dummy_NWay_1Norm_8,safSg=SG_dummy_n,safApp=N'
May 7 19:37:29 SLES-SLOT-2 osafamfnd[2823]: Assigned
'safSi=dummy_NWay_1Norm_3,safApp=N' STANDBY to
'safSu=dummy_NWay_1Norm_8,safSg=SG_dummy_n,safApp=N'
May 7 19:37:32 SLES-SLOT-2 osafimmnd[2656]: Timeout while waiting for
implementer, aborting ccb:333
May 7 19:37:32 SLES-SLOT-2 osafimmnd[2656]: Aborting ccb 333 while waiting for
replies from implementers on DELETE-OP
May 7 19:37:32 SLES-SLOT-2 osafimmnd[2656]: Ccb 333 ABORTED
/var/log/messages on payload3
--------------------------------------------------------------------------------
May 7 19:37:29 SLES-SLOT-3 osafamfnd[2832]: Assigned
'safSi=dummy_NWay_1Norm_3,safApp=N' STANDBY to
'safSu=dummy_NWay_1Norm_1,safSg=SG_dummy_n,safApp=N'
May 7 19:37:33 SLES-SLOT-3 osafimmnd[2612]: Aborting ccb 333 while waiting for
replies from implementers on DELETE-OP
May 7 19:37:33 SLES-SLOT-3 osafimmnd[2612]: Ccb 333 ABORTED
May 7 19:37:37 SLES-SLOT-3 osafimmnd[2612]: ERR_BAD_HANDLE: Handle use is
blocked by pending reply on syncronous call
May 7 19:37:37 SLES-SLOT-3 osafimmnd[2612]: IMMND - Client Node Get Failed for
cli_hdl 9375913739023
May 7 19:37:37 SLES-SLOT-3 immcfg: logtrace: trace enabled to file
/tmp/imma_sisideps.txt, mask=0xffffffff
May 7 19:37:37 SLES-SLOT-3 immcfg: IMMA library TRACE initialize done pid:4855
svid:26 file:/tmp/imma_sisideps.txt
May 7 19:37:37 SLES-SLOT-3 osafimmnd[2612]: Ccb 334 COMMITTED
May 7 19:37:37 SLES-SLOT-3 osafamfnd[2832]: Assigning
'safSi=dummy_NWay_1Norm_5,safApp=N' ACTIVE to
'safSu=dummy_NWay_1Norm_5,safSg=SG_dummy_n,safApp=N'
ps : other models were also running and operations were going on parallel.
observed crash in nwayactive model.
logs are huge and cannot be attached.
Changed 13 months ago by surenderk ¶
■description modified (diff)
Changed 13 months ago by ravisekhar ¶
subject is about locking of SI of nwayactive model, but in the backtrace
issue seems to be in the Nway model(avd_sg_nway_susi_succ_sg_realign). We
cannot think of dependency across the SG's(between nway and nwayactive) also as
issue occurred before dependency is configured.
In the normal nwayactive and nway we are not seeing any issue, please update
the configuration and scenario
Changed 13 months ago by ravisekhar ¶
■priority changed from critical to major
■summary changed from amfd crashed while locking SI in nwayactive model and
cluster went for reboot to amfd crashed while locking SI in Nway model and
cluster went for reboot
■milestone changed from 4.2.0.GA to future_releases
As per the description issue comes when multiple applications with different
red models are brought up and admin operations are carried out simultaneously.
As this is not the usual case changing the priority to major
Changed 13 months ago by surenderk
■attachment imm.xml added
xml file
Changed 13 months ago by surenderk ¶
xml file attached.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets