Hi,

I have raised a ticket for the CLM assert, 
https://sourceforge.net/p/opensaf/tickets/800/.

Have also floated a patch for the ticket(applied on top of opensaf-4.3.2rc1) as 
available in 
https://sourceforge.net/p/opensaf/mailman/opensaf-devel/thread/70f1811553e7f6baf39d.1393633650%40ubuntu/#msg32038530
Please test the patch as it is occuring mostly (till now) on your systems?

Thanks,
Mathi.

----- [email protected] wrote:

> Hi,
> 
> I would recommend migrating to 4.3.1 atleast and also plan to migrate
> to 4.3.2 or 4.4 once the GA is made available
> beginning march.
> 
> Attached is the patch that neel had mentioned. You could try applying
> this or migrate to 4.3.1
> (http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.3.1.tar.gz/download)
> 
> Also, could you provide the output of thread apply all bt full , for
> the assertions below in AMF and CLM?
> 
> Thanks,
> Mathi.
> 
> ----- [email protected] wrote:
> 
> > OK I’ll try the patch (can it be applied to 4.3.0 or do I need to
> take
> > a snapshot from the dev branch)?
> > 
> > Also found the following core’s from osafamfd last night.  Would
> these
> > be related to this problem also or is this something else?
> > 
> > thanks
> > —
> > tony
> > 
> > — Assertion in osafamfd ---
> > 
> > #3  0x0000000000440fba in avd_mds_qsd_role_evh (cb=0x6c4d40
> <_avd_cb>,
> > evt=0x7f87240080a0) at avd_role.c:575
> > 575 osafassert(0);
> > (gdb) p cb
> > $4 = (AVD_CL_CB *) 0x6c4d40 <_avd_cb>
> > (gdb) p *cb
> > $5 = {avd_mbx = 4291821569, avd_hb_mbx = 0, mds_handle = 0x0,
> > init_state = AVD_APP_STATE, avd_fover_state = false, avail_state_avd
> =
> > SA_AMF_HA_ACTIVE, vaddr_pwe_hdl = 65537, vaddr_hdl = 1, adest_hdl =
> > 131071,
> >   vaddr = 1, other_avd_adest = 0, local_avnd_adest =
> 298033400332316,
> > nd_msg_queue_list = {nd_msg_queue = 0x0, tail = 0x0}, evt_queue =
> > {evt_msg_queue = 0x0, tail = 0x0}, mbcsv_hdl = 4293918753,
> >   ckpt_hdl = 4292870177, mbcsv_sel_obj = 13, stby_sync_state =
> > AVD_STBY_IN_SYNC, synced_reo_type = 0, async_updt_cnt = {cb_updt =
> 61,
> > node_updt = 8101, app_updt = 78, sg_updt = 3980, su_updt = 5980,
> > si_updt = 2420,
> >     sg_su_oprlist_updt = 1210, sg_admin_si_updt = 0, siass_updt =
> > 1437, comp_updt = 6488, csi_updt = 0, compcstype_updt = 0,
> > si_trans_updt = 0}, sync_required = true, async_updt_msgs =
> > {async_updt_queue = 0x0,
> >     tail = 0x0}, edu_hdl = {is_inited = true, tree = {root_node =
> {bit
> > = -1, left = 0x8dc760, right = 0x6c4e08 <_avd_cb+200>, key_info =
> > 0x8dc650 ""}, params = {key_size = 8, info_size = 0, actual_key_size
> =
> > 0,
> >         node_size = 0}, n_nodes = 22}, to_version = 6}, mds_edu_hdl
> =
> > {is_inited = true, tree = {root_node = {bit = -1, left = 0x8dbbe0,
> > right = 0x6c4e50 <_avd_cb+272>, key_info = 0x8ca840 ""}, params = {
> >         key_size = 8, info_size = 393, actual_key_size = 0,
> node_size
> > = 0}, n_nodes = 19}, to_version = 4}, cluster_init_time = 0,
> > node_id_avd = 69391, node_id_avd_other = 69647, node_avd_failed =
> 0,
> > node_list = {
> >     root_node = {bit = -1, left = 0x6c4ea8 <_avd_cb+360>, right =
> > 0x6c4ea8 <_avd_cb+360>, key_info = 0x8ca820 ""}, params = {key_size
> =
> > 4, info_size = 0, actual_key_size = 0, node_size = 0}, n_nodes =
> 0},
> >   amf_init_tmr = {tmr_id = 0x7f8724004020, type = AVD_TMR_CL_INIT,
> > node_id = 0, spons_si_name = {length = 0, value = '\000' <repeats
> 255
> > times>}, dep_si_name = {length = 0, value = '\000' <repeats 255
> > times>},
> >     is_active = false}, heartbeat_tmr = {tmr_id = 0x7f8724006f70,
> type
> > = AVD_TMR_SND_HB, node_id = 0, spons_si_name = {length = 0, value =
> > '\000' <repeats 255 times>}, dep_si_name = {length = 0,
> >       value = '\000' <repeats 255 times>}, is_active = true},
> > heartbeat_tmr_period = 10000000000, nodes_exit_cnt = 15, ntfHandle
> =
> > 4279238657, ext_comp_info = {local_avnd_node = 0x0,
> ext_comp_hlt_check
> > = 0x0},
> >   peer_msg_fmt_ver = 4, avd_peer_ver = 6, immOiHandle =
> 77309480719,
> > immOmHandle = 81604448015, imm_sel_obj = 17, is_implementer = true,
> > clmHandle = 4285530113, clm_sel_obj = 15, swap_switch = SA_FALSE,
> >   active_services_exist = true}
> > (gdb) p evt
> > $6 = (AVD_EVT *) 0x7f87240080a0
> > (gdb) p *evt
> > $7 = {next = {next = 0x0}, rcv_evt = AVD_EVT_MDS_QSD_ACK, info =
> > {avnd_msg = 0x0, avd_msg = 0x0, node_id = 0, tmr = {tmr_id = 0x0,
> type
> > = AVD_TMR_SND_HB, node_id = 0, spons_si_name = {length = 0,
> >         value = '\000' <repeats 255 times>}, dep_si_name = {length
> =
> > 0, value = '\000' <repeats 255 times>}, is_active = false}}}
> > 
> > 
> > Core was generated by `/usr/lib64/opensaf/osafamfd osafamfd'.
> > Program terminated with signal 6, Aborted.
> > #0  0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> > 64   return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> > (gdb) bt
> > #0  0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> > #1  0x0000003e42237d13 in __GI_abort () at abort.c:91
> > #2  0x0000003e4361a602 in __osafassert_fail (__file=0x4ac056
> > "avd_role.c", __line=575, __func=0x4acca0 <__FUNCTION__.12339>
> > "avd_mds_qsd_role_evh", __assertion=0x4ac5e0 "0") at sysf_def.c:301
> > #3  0x0000000000440fba in avd_mds_qsd_role_evh (cb=0x6c4d40
> <_avd_cb>,
> > evt=0x7f87240080a0) at avd_role.c:575
> > #4  0x000000000043fd56 in avd_process_event (cb_now=0x6c4d40
> > <_avd_cb>, evt=0x7f87240080a0) at avd_proc.c:591
> > #5  0x000000000043fab7 in avd_main_proc () at avd_proc.c:507
> > #6  0x0000000000409e79 in main (argc=2, argv=0x7fffec8e6648) at
> > amfd_main.c:47
> > 
> > 
> > 
> > — Assertion in clm —
> > 
> > (gdb) p nodeAddress
> > $1 = (SaClmNodeAddressT *) 0x7f916c005b44
> > (gdb) p *nodeAddress
> > $2 = {family = (unknown: 0), length = 19365, value =
> > "\000\000\000\000\000\000\017\001\001\000\001", '\000' <repeats 52
> > times>}
> > 
> > 
> > #0  0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> > 64   return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> > (gdb) bt
> > #0  0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> > #1  0x0000003e42237d13 in __GI_abort () at abort.c:91
> > #2  0x0000003e4361a602 in __osafassert_fail (__file=0x425d55
> > "clms_mds.c", __line=307, __func=0x4263a0 <__FUNCTION__.9929>
> > "encodeNodeAddressT", __assertion=0x425e5a "0") at sysf_def.c:301
> > #3  0x000000000041dd5e in encodeNodeAddressT (uba=0x7f9173ffe6c8,
> > nodeAddress=0x7f916c005b44) at clms_mds.c:307
> > #4  0x000000000041de72 in clms_enc_node_get_msg
> (uba=0x7f9173ffe6c8,
> > msg=0x7f916c005b40) at clms_mds.c:332
> > #5  0x000000000041e23d in clms_enc_cluster_ntf_buf_msg
> > (uba=0x7f9173ffe6c8, notify_info=0x7f9173ffeb88) at clms_mds.c:418
> > #6  0x000000000041e57b in clms_enc_track_cbk_msg
> (uba=0x7f9173ffe6c8,
> > msg=0x7f9173ffeb70) at clms_mds.c:533
> > #7  0x000000000041ecf7 in clms_mds_enc (info=0x7f9173ffe700) at
> > clms_mds.c:724
> > #8  0x000000000041f411 in clms_mds_enc_flat (info=0x7f9173ffe700)
> at
> > clms_mds.c:908
> > #9  0x000000000041fb0f in clms_mds_callback (info=0x7f9173ffe700)
> at
> > clms_mds.c:1184
> > #10 0x0000003e4364e6b7 in mcm_msg_encode_full_or_flat_and_send
> (to=2
> > '\002', to_msg=0x7f9173ffe8c0, to_svc_id=35, svc_cb=0x6307d0,
> > adest=299135479218219, dest_vdest_id=65535, snd_type=0, xch_id=0,
> >     pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:1417
> > #11 0x0000003e4364d96f in mds_mcm_send_msg_enc (to=2 '\002',
> > svc_cb=0x6307d0, to_msg=0x7f9173ffe8c0, to_svc_id=35,
> > dest_vdest_id=65535, req=0x7f9173ffe980, xch_id=0,
> > dest=299135479218219, pri=MDS_SEND_PRIORITY_MEDIUM)
> >     at mds_c_sndrcv.c:1084
> > #12 0x0000003e4364d6b3 in mcm_pvt_normal_snd_process_common
> > (env_hdl=65552, fr_svc_id=34, to_msg=..., to_dest=299135479218219,
> > to_svc_id=35, req=0x7f9173ffe980, pri=MDS_SEND_PRIORITY_MEDIUM,
> > xch_id=0)
> >     at mds_c_sndrcv.c:1033
> > #13 0x0000003e4364d1f8 in mcm_pvt_normal_svc_snd (env_hdl=65552,
> > fr_svc_id=34, msg=0x7f9173ffeb70, to_dest=299135479218219,
> > to_svc_id=35, req=0x7f9173ffe980, pri=MDS_SEND_PRIORITY_MEDIUM) at
> > mds_c_sndrcv.c:890
> > #14 0x0000003e4364cc8b in mds_mcm_send (info=0x7f9173ffeab0) at
> > mds_c_sndrcv.c:675
> > #15 0x0000003e4364c2a6 in mds_send (info=0x7f9173ffeab0) at
> > mds_c_sndrcv.c:384
> > #16 0x0000003e4364bf12 in ncsmds_api
> (svc_to_mds_info=0x7f9173ffeab0)
> > at mds_papi.c:104
> > #17 0x00000000004201d7 in clms_mds_msg_send (cb=0x62ac40
> <_clms_cb>,
> > msg=0x7f9173ffeb70, dest=0x65a668, mds_ctxt=0x0,
> > prio=MDS_SEND_PRIORITY_MEDIUM, svc_id=NCSMDS_SVC_ID_CLMA) at
> > clms_mds.c:1453
> > #18 0x000000000040f499 in clms_prep_and_send_track (cb=0x62ac40
> > <_clms_cb>, node=0x654ae0, client=0x65a640,
> > step=SA_CLM_CHANGE_COMPLETED, notify=0x7f916c0009a0) at
> > clms_imm.c:1064
> > #19 0x000000000040e8db in clms_send_track (cb=0x62ac40 <_clms_cb>,
> > node=0x654ae0, step=SA_CLM_CHANGE_COMPLETED) at clms_imm.c:835
> > #20 0x0000000000409430 in clms_track_send_node_down (node=0x654ae0)
> at
> > clms_evt.c:428
> > #21 0x000000000040ca38 in imm_impl_set_node_down_proc (_cb=0x62ac40
> > <_clms_cb>) at clms_imm.c:93
> > #22 0x0000003e42a07e18 in start_thread (arg=0x7f9173fff700) at
> > pthread_create.c:309
> > #23 0x0000003e422e88bd in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> > On Feb 21, 2014, at 7:35 AM, Neelakanta Reddy
> > <[email protected]<mailto:[email protected]>>
> > wrote:
> > 
> > Hi,
> > 
> > Comments inline.
> > 
> > /Neel.
> > On Friday 21 February 2014 05:45 PM, Tony Hart wrote:
> > Hi Neel,
> > Thanks for the analysis.  It seems that multiple components in this
> > case tripped on the race condition, I assume from your description
> > that the fix was only applied to CLM?  Also in this case the node
> > didn’t recover despite multiple restarts - does that fit with the
> > scenario in ticket 528?
> > Apply the patch for CLM and test. If it is reproducible, please
> share
> > the sylogs of both the controllers.
> > Is this reproducible? - not sure yet, this is the first time I’ve
> seen
> > this particular crash, but we recently started testing on bigger
> > systems and that could be a factor.
> > 
> > We really need a fix for this - should I open a ticket?
> > 
> > thanks
> > —
> > tony
> > 
> > On Feb 21, 2014, at 5:36 AM, Neelakanta Reddy
> > <[email protected]<mailto:[email protected]>>
> > wrote:
> > 
> > Hi,
> > 
> > The same problem is observed in CLM, and is fixed in
> >
> sourceforge.net/p/opensaf/tickets/528<http://sourceforge.net/p/opensaf/tickets/528>
> > .
> > It is fixed in 4622, changeset for opensaf-4.3.x .
> > 
> > For other services, the problem is not yet fixed.
> > 
> > Can you, please confirm it is re-producible always.
> > 
> > /Neel.
> > 
> > 
> > On Friday 21 February 2014 05:30 AM, Tony Hart wrote:
> > 4.3.0
> > 
> > BTW is there a way to tell at runtime what version is installed?
> > 
> > 
> > On Feb 20, 2014, at 4:03 AM, Neelakanta Reddy
> > <[email protected]<mailto:[email protected]>>
> > wrote:
> > 
> > Hi,
> > 
> > which version of OpenSAF is used. It looks to be an older release.
> > 
> > /Neel.
> > 
> > On Wednesday 19 February 2014 08:42 PM, Tony Hart wrote:
> > Hi Neel,
> > Thanks for the reply, I’ve attached a fuller log (just the osaf
> > message) from SCM2,  unfortunately the logs from SCM1 are not
> > available.
> > 
> > —
> > tony
> > 
> > 
> > 
> >
> ------------------------------------------------------------------------------
> > Managing the Performance of Cloud-Based Applications
> > Take advantage of what the Cloud has to offer - Avoid Common
> > Pitfalls.
> > Read the Whitepaper.
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Opensaf-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-users
> 
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common
> Pitfalls.
> Read the Whitepaper.
> http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to