---
** [tickets:#3217] mbc: agent crash if mds sendto() error**
**Status:** assigned
**Milestone:** 5.20.11
**Created:** Wed Sep 09, 2020 06:18 AM UTC by Thuan Tran
**Last Updated:** Wed Sep 09, 2020 06:18 AM UTC
**Owner:** Thuan Tran
With #3208 fix, sometimes ntfd crash during cluster shutdown.
The back trace as following:
~~~
Thread 1 (Thread 0x7fc0a9b4a100 (LWP 276)):
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fc0a80bd8b1 in __GI_abort () at abort.c:79
#2 0x00007fc0a8106907 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7fc0a8233dfa "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007fc0a810d97a in malloc_printerr (str=str@entry=0x7fc0a823206e
"malloc(): memory corruption") at malloc.c:5350
#4 0x00007fc0a8111a04 in _int_malloc (av=av@entry=0x7fc0a8468c40 <main_arena>,
bytes=bytes@entry=59) at malloc.c:3738
#5 0x00007fc0a8117121 in __libc_calloc (n=n@entry=1,
elem_size=elem_size@entry=59) at malloc.c:3436
#6 0x00007fc0a8c9b40c in mds_mdtm_send_tipc (req=0x7ffc9f16ec60) at
src/mds/mds_dt_tipc.c:2736
#7 0x00007fc0a8c88f07 in mcm_msg_encode_full_or_flat_and_send (to=to@entry=2
'\002', to_msg=to_msg@entry=0x7ffc9f16ef50, to_svc_id=to_svc_id@entry=29,
svc_cb=svc_cb@entry=0x5568f4afcea0, adest=adest@entry=564114769357041,
dest_vdest_id=dest_vdest_id@entry=65535, snd_type=4, xch_id=116,
pri=MDS_SEND_PRIORITY_HIGH) at src/mds/mds_c_sndrcv.c:1774
#8 0x00007fc0a8c8a5b7 in mds_mcm_send_msg_enc (to=<optimized out>,
svc_cb=svc_cb@entry=0x5568f4afcea0, to_msg=to_msg@entry=0x7ffc9f16ef50,
to_svc_id=to_svc_id@entry=29, dest_vdest_id=dest_vdest_id@entry=65535,
req=req@entry=0x7ffc9f16eff0, xch_id=116, dest=564114769357041,
pri=MDS_SEND_PRIORITY_HIGH) at src/mds/mds_c_sndrcv.c:1255
#9 0x00007fc0a8c8ac30 in mcm_pvt_red_snd_process_common
(env_hdl=env_hdl@entry=65550, fr_svc_id=fr_svc_id@entry=28, to_msg=...,
to_dest=to_dest@entry=564114769357041, to_svc_id=to_svc_id@entry=29,
req=req@entry=0x7ffc9f16eff0, pri=pri@entry=MDS_SEND_PRIORITY_HIGH, xch_id=116,
anchor=<optimized out>) at src/mds/mds_c_sndrcv.c:2664
#10 0x00007fc0a8c8dba3 in mcm_pvt_normal_svc_snd_rsp
(pri=MDS_SEND_PRIORITY_HIGH, req=0x7ffc9f16eff0, to_svc_id=29,
to_dest=564114769357041, msg=<optimized out>, fr_svc_id=28, env_hdl=65550) at
src/mds/mds_c_sndrcv.c:3699
#11 mds_mcm_send (info=0x1d) at src/mds/mds_c_sndrcv.c:835
#12 mds_send (info=info@entry=0x7ffc9f16f0a0) at src/mds/mds_c_sndrcv.c:458
#13 0x00007fc0a8c9636c in ncsmds_api
(svc_to_mds_info=svc_to_mds_info@entry=0x7ffc9f16f0a0) at src/mds/mds_papi.c:165
#14 0x00005568f2e7598f in ntfs_mds_msg_send (cb=<optimized out>,
msg=msg@entry=0x7ffc9f16f130, dest=dest@entry=0x7ffc9f16f128,
mds_ctxt=mds_ctxt@entry=0x7fc09c01278c, prio=prio@entry=MDS_SEND_PRIORITY_HIGH)
at src/ntf/ntfd/ntfs_mds.c:1310
#15 0x00005568f2e75f68 in notfication_result_lib (error=error@entry=SA_AIS_OK,
notificationId=182, mdsCtxt=0x7fc09c01278c, frDest=<optimized out>) at
src/ntf/ntfd/ntfs_com.c:181
#16 0x00005568f2e809da in NtfClient::confirmNtfNotification
(this=this@entry=0x5568f4afc440, notificationId=<optimized out>,
mdsCtxt=mdsCtxt@entry=0x7fc09c01278c, mdsDest=mdsDest@entry=564114769357041) at
src/ntf/ntfd/NtfClient.cc:341
#17 0x00005568f2e80c47 in NtfClient::notificationReceived (this=0x5568f4afc440,
clientId=clientId@entry=2, notification=std::tr1::shared_ptr<NtfNotification>
(use count 2, weak count 0) = {...}, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c) at
src/ntf/ntfd/NtfClient.cc:146
#18 0x00005568f2e86c32 in NtfAdmin::processNotification
(this=this@entry=0x5568f4afb6a0, clientId=clientId@entry=2,
notificationType=notificationType@entry=SA_NTF_TYPE_STATE_CHANGE,
sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0,
mdsCtxt=mdsCtxt@entry=0x7fc09c01278c, notificationId=<optimized out>) at
src/ntf/ntfd/NtfAdmin.cc:211
#19 0x00005568f2e86ec1 in NtfAdmin::notificationReceived (this=0x5568f4afb6a0,
clientId=2, notificationType=SA_NTF_TYPE_STATE_CHANGE,
sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0, mdsCtxt=0x7fc09c01278c) at
src/ntf/ntfd/NtfAdmin.cc:262
#20 0x00005568f2e86f52 in notificationReceived (clientId=<optimized out>,
notificationType=<optimized out>, sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0,
mdsCtxt=mdsCtxt@entry=0x7fc09c01278c) at src/ntf/ntfd/NtfAdmin.cc:1127
#21 0x00005568f2e7086a in proc_send_not_msg (cb=<optimized out>,
evt=0x7fc09c012780) at src/ntf/ntfd/ntfs_evt.c:474
#22 0x00005568f2e7033e in process_api_evt (evt=0x7fc09c012780) at
src/ntf/ntfd/ntfs_evt.c:673
#23 0x00005568f2e70f19 in ntfs_process_mbx (mbx=<optimized out>) at
src/ntf/ntfd/ntfs_evt.c:708
#24 0x00005568f2e6ebad in main (argc=<optimized out>, argv=<optimized out>) at
src/ntf/ntfd/ntfs_main.c:400
~~~
The problem is MBC free buffer by #3208 that MDS already freed
~~~
<139>1 2020-09-08T16:16:48.284822+02:00 SC-1 osafntfd 276 mds.log [meta
sequenceId="80"] MDTM: Failed to send message err :No route to host
<139>1 2020-09-08T16:16:48.284842+02:00 SC-1 osafntfd 276 mds.log [meta
sequenceId="81"] MDTM: Unable to send the msg thru TIPC
<139>1 2020-09-08T16:16:48.284866+02:00 SC-1 osafntfd 276 mds.log [meta
sequenceId="82"] MDS_SND_RCV: RED sndrsp message SEND Failed from svc_id =
MBCSV(19), to svc_id = MBCSV(19)
~~~
Need update a part of solution #3208 to solve this issue.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets