Hi, I have raised a ticket for the CLM assert, https://sourceforge.net/p/opensaf/tickets/800/.
Have also floated a patch for the ticket(applied on top of opensaf-4.3.2rc1) as available in https://sourceforge.net/p/opensaf/mailman/opensaf-devel/thread/70f1811553e7f6baf39d.1393633650%40ubuntu/#msg32038530 Please test the patch as it is occuring mostly (till now) on your systems? Thanks, Mathi. ----- [email protected] wrote: > Hi, > > I would recommend migrating to 4.3.1 atleast and also plan to migrate > to 4.3.2 or 4.4 once the GA is made available > beginning march. > > Attached is the patch that neel had mentioned. You could try applying > this or migrate to 4.3.1 > (http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.3.1.tar.gz/download) > > Also, could you provide the output of thread apply all bt full , for > the assertions below in AMF and CLM? > > Thanks, > Mathi. > > ----- [email protected] wrote: > > > OK I’ll try the patch (can it be applied to 4.3.0 or do I need to > take > > a snapshot from the dev branch)? > > > > Also found the following core’s from osafamfd last night. Would > these > > be related to this problem also or is this something else? > > > > thanks > > — > > tony > > > > — Assertion in osafamfd --- > > > > #3 0x0000000000440fba in avd_mds_qsd_role_evh (cb=0x6c4d40 > <_avd_cb>, > > evt=0x7f87240080a0) at avd_role.c:575 > > 575 osafassert(0); > > (gdb) p cb > > $4 = (AVD_CL_CB *) 0x6c4d40 <_avd_cb> > > (gdb) p *cb > > $5 = {avd_mbx = 4291821569, avd_hb_mbx = 0, mds_handle = 0x0, > > init_state = AVD_APP_STATE, avd_fover_state = false, avail_state_avd > = > > SA_AMF_HA_ACTIVE, vaddr_pwe_hdl = 65537, vaddr_hdl = 1, adest_hdl = > > 131071, > > vaddr = 1, other_avd_adest = 0, local_avnd_adest = > 298033400332316, > > nd_msg_queue_list = {nd_msg_queue = 0x0, tail = 0x0}, evt_queue = > > {evt_msg_queue = 0x0, tail = 0x0}, mbcsv_hdl = 4293918753, > > ckpt_hdl = 4292870177, mbcsv_sel_obj = 13, stby_sync_state = > > AVD_STBY_IN_SYNC, synced_reo_type = 0, async_updt_cnt = {cb_updt = > 61, > > node_updt = 8101, app_updt = 78, sg_updt = 3980, su_updt = 5980, > > si_updt = 2420, > > sg_su_oprlist_updt = 1210, sg_admin_si_updt = 0, siass_updt = > > 1437, comp_updt = 6488, csi_updt = 0, compcstype_updt = 0, > > si_trans_updt = 0}, sync_required = true, async_updt_msgs = > > {async_updt_queue = 0x0, > > tail = 0x0}, edu_hdl = {is_inited = true, tree = {root_node = > {bit > > = -1, left = 0x8dc760, right = 0x6c4e08 <_avd_cb+200>, key_info = > > 0x8dc650 ""}, params = {key_size = 8, info_size = 0, actual_key_size > = > > 0, > > node_size = 0}, n_nodes = 22}, to_version = 6}, mds_edu_hdl > = > > {is_inited = true, tree = {root_node = {bit = -1, left = 0x8dbbe0, > > right = 0x6c4e50 <_avd_cb+272>, key_info = 0x8ca840 ""}, params = { > > key_size = 8, info_size = 393, actual_key_size = 0, > node_size > > = 0}, n_nodes = 19}, to_version = 4}, cluster_init_time = 0, > > node_id_avd = 69391, node_id_avd_other = 69647, node_avd_failed = > 0, > > node_list = { > > root_node = {bit = -1, left = 0x6c4ea8 <_avd_cb+360>, right = > > 0x6c4ea8 <_avd_cb+360>, key_info = 0x8ca820 ""}, params = {key_size > = > > 4, info_size = 0, actual_key_size = 0, node_size = 0}, n_nodes = > 0}, > > amf_init_tmr = {tmr_id = 0x7f8724004020, type = AVD_TMR_CL_INIT, > > node_id = 0, spons_si_name = {length = 0, value = '\000' <repeats > 255 > > times>}, dep_si_name = {length = 0, value = '\000' <repeats 255 > > times>}, > > is_active = false}, heartbeat_tmr = {tmr_id = 0x7f8724006f70, > type > > = AVD_TMR_SND_HB, node_id = 0, spons_si_name = {length = 0, value = > > '\000' <repeats 255 times>}, dep_si_name = {length = 0, > > value = '\000' <repeats 255 times>}, is_active = true}, > > heartbeat_tmr_period = 10000000000, nodes_exit_cnt = 15, ntfHandle > = > > 4279238657, ext_comp_info = {local_avnd_node = 0x0, > ext_comp_hlt_check > > = 0x0}, > > peer_msg_fmt_ver = 4, avd_peer_ver = 6, immOiHandle = > 77309480719, > > immOmHandle = 81604448015, imm_sel_obj = 17, is_implementer = true, > > clmHandle = 4285530113, clm_sel_obj = 15, swap_switch = SA_FALSE, > > active_services_exist = true} > > (gdb) p evt > > $6 = (AVD_EVT *) 0x7f87240080a0 > > (gdb) p *evt > > $7 = {next = {next = 0x0}, rcv_evt = AVD_EVT_MDS_QSD_ACK, info = > > {avnd_msg = 0x0, avd_msg = 0x0, node_id = 0, tmr = {tmr_id = 0x0, > type > > = AVD_TMR_SND_HB, node_id = 0, spons_si_name = {length = 0, > > value = '\000' <repeats 255 times>}, dep_si_name = {length > = > > 0, value = '\000' <repeats 255 times>}, is_active = false}}} > > > > > > Core was generated by `/usr/lib64/opensaf/osafamfd osafamfd'. > > Program terminated with signal 6, Aborted. > > #0 0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at > > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > > 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > > (gdb) bt > > #0 0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at > > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > > #1 0x0000003e42237d13 in __GI_abort () at abort.c:91 > > #2 0x0000003e4361a602 in __osafassert_fail (__file=0x4ac056 > > "avd_role.c", __line=575, __func=0x4acca0 <__FUNCTION__.12339> > > "avd_mds_qsd_role_evh", __assertion=0x4ac5e0 "0") at sysf_def.c:301 > > #3 0x0000000000440fba in avd_mds_qsd_role_evh (cb=0x6c4d40 > <_avd_cb>, > > evt=0x7f87240080a0) at avd_role.c:575 > > #4 0x000000000043fd56 in avd_process_event (cb_now=0x6c4d40 > > <_avd_cb>, evt=0x7f87240080a0) at avd_proc.c:591 > > #5 0x000000000043fab7 in avd_main_proc () at avd_proc.c:507 > > #6 0x0000000000409e79 in main (argc=2, argv=0x7fffec8e6648) at > > amfd_main.c:47 > > > > > > > > — Assertion in clm — > > > > (gdb) p nodeAddress > > $1 = (SaClmNodeAddressT *) 0x7f916c005b44 > > (gdb) p *nodeAddress > > $2 = {family = (unknown: 0), length = 19365, value = > > "\000\000\000\000\000\000\017\001\001\000\001", '\000' <repeats 52 > > times>} > > > > > > #0 0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at > > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > > 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > > (gdb) bt > > #0 0x0000003e42234bb5 in __GI_raise (sig=<optimized out>) at > > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > > #1 0x0000003e42237d13 in __GI_abort () at abort.c:91 > > #2 0x0000003e4361a602 in __osafassert_fail (__file=0x425d55 > > "clms_mds.c", __line=307, __func=0x4263a0 <__FUNCTION__.9929> > > "encodeNodeAddressT", __assertion=0x425e5a "0") at sysf_def.c:301 > > #3 0x000000000041dd5e in encodeNodeAddressT (uba=0x7f9173ffe6c8, > > nodeAddress=0x7f916c005b44) at clms_mds.c:307 > > #4 0x000000000041de72 in clms_enc_node_get_msg > (uba=0x7f9173ffe6c8, > > msg=0x7f916c005b40) at clms_mds.c:332 > > #5 0x000000000041e23d in clms_enc_cluster_ntf_buf_msg > > (uba=0x7f9173ffe6c8, notify_info=0x7f9173ffeb88) at clms_mds.c:418 > > #6 0x000000000041e57b in clms_enc_track_cbk_msg > (uba=0x7f9173ffe6c8, > > msg=0x7f9173ffeb70) at clms_mds.c:533 > > #7 0x000000000041ecf7 in clms_mds_enc (info=0x7f9173ffe700) at > > clms_mds.c:724 > > #8 0x000000000041f411 in clms_mds_enc_flat (info=0x7f9173ffe700) > at > > clms_mds.c:908 > > #9 0x000000000041fb0f in clms_mds_callback (info=0x7f9173ffe700) > at > > clms_mds.c:1184 > > #10 0x0000003e4364e6b7 in mcm_msg_encode_full_or_flat_and_send > (to=2 > > '\002', to_msg=0x7f9173ffe8c0, to_svc_id=35, svc_cb=0x6307d0, > > adest=299135479218219, dest_vdest_id=65535, snd_type=0, xch_id=0, > > pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:1417 > > #11 0x0000003e4364d96f in mds_mcm_send_msg_enc (to=2 '\002', > > svc_cb=0x6307d0, to_msg=0x7f9173ffe8c0, to_svc_id=35, > > dest_vdest_id=65535, req=0x7f9173ffe980, xch_id=0, > > dest=299135479218219, pri=MDS_SEND_PRIORITY_MEDIUM) > > at mds_c_sndrcv.c:1084 > > #12 0x0000003e4364d6b3 in mcm_pvt_normal_snd_process_common > > (env_hdl=65552, fr_svc_id=34, to_msg=..., to_dest=299135479218219, > > to_svc_id=35, req=0x7f9173ffe980, pri=MDS_SEND_PRIORITY_MEDIUM, > > xch_id=0) > > at mds_c_sndrcv.c:1033 > > #13 0x0000003e4364d1f8 in mcm_pvt_normal_svc_snd (env_hdl=65552, > > fr_svc_id=34, msg=0x7f9173ffeb70, to_dest=299135479218219, > > to_svc_id=35, req=0x7f9173ffe980, pri=MDS_SEND_PRIORITY_MEDIUM) at > > mds_c_sndrcv.c:890 > > #14 0x0000003e4364cc8b in mds_mcm_send (info=0x7f9173ffeab0) at > > mds_c_sndrcv.c:675 > > #15 0x0000003e4364c2a6 in mds_send (info=0x7f9173ffeab0) at > > mds_c_sndrcv.c:384 > > #16 0x0000003e4364bf12 in ncsmds_api > (svc_to_mds_info=0x7f9173ffeab0) > > at mds_papi.c:104 > > #17 0x00000000004201d7 in clms_mds_msg_send (cb=0x62ac40 > <_clms_cb>, > > msg=0x7f9173ffeb70, dest=0x65a668, mds_ctxt=0x0, > > prio=MDS_SEND_PRIORITY_MEDIUM, svc_id=NCSMDS_SVC_ID_CLMA) at > > clms_mds.c:1453 > > #18 0x000000000040f499 in clms_prep_and_send_track (cb=0x62ac40 > > <_clms_cb>, node=0x654ae0, client=0x65a640, > > step=SA_CLM_CHANGE_COMPLETED, notify=0x7f916c0009a0) at > > clms_imm.c:1064 > > #19 0x000000000040e8db in clms_send_track (cb=0x62ac40 <_clms_cb>, > > node=0x654ae0, step=SA_CLM_CHANGE_COMPLETED) at clms_imm.c:835 > > #20 0x0000000000409430 in clms_track_send_node_down (node=0x654ae0) > at > > clms_evt.c:428 > > #21 0x000000000040ca38 in imm_impl_set_node_down_proc (_cb=0x62ac40 > > <_clms_cb>) at clms_imm.c:93 > > #22 0x0000003e42a07e18 in start_thread (arg=0x7f9173fff700) at > > pthread_create.c:309 > > #23 0x0000003e422e88bd in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 > > On Feb 21, 2014, at 7:35 AM, Neelakanta Reddy > > <[email protected]<mailto:[email protected]>> > > wrote: > > > > Hi, > > > > Comments inline. > > > > /Neel. > > On Friday 21 February 2014 05:45 PM, Tony Hart wrote: > > Hi Neel, > > Thanks for the analysis. It seems that multiple components in this > > case tripped on the race condition, I assume from your description > > that the fix was only applied to CLM? Also in this case the node > > didn’t recover despite multiple restarts - does that fit with the > > scenario in ticket 528? > > Apply the patch for CLM and test. If it is reproducible, please > share > > the sylogs of both the controllers. > > Is this reproducible? - not sure yet, this is the first time I’ve > seen > > this particular crash, but we recently started testing on bigger > > systems and that could be a factor. > > > > We really need a fix for this - should I open a ticket? > > > > thanks > > — > > tony > > > > On Feb 21, 2014, at 5:36 AM, Neelakanta Reddy > > <[email protected]<mailto:[email protected]>> > > wrote: > > > > Hi, > > > > The same problem is observed in CLM, and is fixed in > > > sourceforge.net/p/opensaf/tickets/528<http://sourceforge.net/p/opensaf/tickets/528> > > . > > It is fixed in 4622, changeset for opensaf-4.3.x . > > > > For other services, the problem is not yet fixed. > > > > Can you, please confirm it is re-producible always. > > > > /Neel. > > > > > > On Friday 21 February 2014 05:30 AM, Tony Hart wrote: > > 4.3.0 > > > > BTW is there a way to tell at runtime what version is installed? > > > > > > On Feb 20, 2014, at 4:03 AM, Neelakanta Reddy > > <[email protected]<mailto:[email protected]>> > > wrote: > > > > Hi, > > > > which version of OpenSAF is used. It looks to be an older release. > > > > /Neel. > > > > On Wednesday 19 February 2014 08:42 PM, Tony Hart wrote: > > Hi Neel, > > Thanks for the reply, I’ve attached a fuller log (just the osaf > > message) from SCM2, unfortunately the logs from SCM1 are not > > available. > > > > — > > tony > > > > > > > > > ------------------------------------------------------------------------------ > > Managing the Performance of Cloud-Based Applications > > Take advantage of what the Cloud has to offer - Avoid Common > > Pitfalls. > > Read the Whitepaper. > > > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > > _______________________________________________ > > Opensaf-users mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/opensaf-users > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common > Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
