The segfault is occurred because of the patch
https://sourceforge.net/p/opensaf/tickets/712/.
In the failure case mds is, freeing the buffer somewhere (need to analyze more).
the same segfault is observed for lckd and evtd in our tests.
planning to revert #712 in this ticket, and reopening the #712 for further
analyzing.
---
** [tickets:#731] mbcsv: seg fault in sysf_free_pkt for immd & amfd**
**Status:** unassigned
**Created:** Sun Jan 19, 2014 01:20 PM UTC by Hans Feldt
**Last Updated:** Sun Jan 19, 2014 01:28 PM UTC
**Owner:** nobody
Changeset: 4811:eb57695a171b
Host: Win7, guest: Virtualbox, cluster: lxc
MDS/TCP (important, more of that later)
First core dump:
(gdb) bt
#0 sysf_free_pkt (ub=0x7f8458f3a740 <main_arena>) at sysf_mem.c:471
#1 0x00007f84595ac0ff in mbcsv_send_ckpt_data_to_all_peers
(msg_to_send=msg_to_send@entry=0x7fff4add23f8,
ckpt_inst=ckpt_inst@entry=0x8a1490,
mbc_inst=mbc_inst@entry=0x890aa0) at mbcsv_util.c:486
#2 0x00007f84595a54f7 in mbcsv_process_snd_ckpt_request (arg=0x7fff4add23f0)
at mbcsv_api.c:820
#3 0x000000000040a970 in immd_mbcsv_sync_update (cb=cb@entry=0x629340
<_immd_cb>, msg=msg@entry=0x7fff4add2450) at immd_mbcsv.c:59
#4 0x00000000004054e5 in immd_evt_proc_fevs_req (cb=cb@entry=0x629340
<_immd_cb>, evt=evt@entry=0x7fff4add26b0, sinfo=sinfo@entry=0x7f84540015c0,
deallocate=deallocate@entry=false) at immd_evt.c:283
#5 0x0000000000406a67 in immd_evt_proc_rt_modify_req (cb=cb@entry=0x629340
<_immd_cb>, evt=evt@entry=0x7f8454001480, sinfo=sinfo@entry=0x7f84540015c0)
at immd_evt.c:2325
#6 0x0000000000407ec3 in immd_process_evt () at immd_evt.c:151
#7 0x0000000000402621 in main (argc=<optimized out>, argv=<optimized out>) at
immd_main.c:291
with export MALLOC_CHECK_=2 in immd.conf I instead get:
(gdb) bt
#0 0x00007fd1924e8f77 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fd1924ec5e8 in __GI_abort () at abort.c:90
#2 0x00007fd192534ed4 in malloc_printerr (ptr=0xc24e00, str=0x7fd192636205
"free(): invalid pointer", action=<optimized out>) at malloc.c:4927
#3 free_check (mem=0xc24e00, caller=<optimized out>) at hooks.c:277
#4 0x00007fd192ece2ac in sysf_free_pkt (ub=0x190) at sysf_mem.c:470
#5 0x00007fd192ee50ff in mbcsv_send_ckpt_data_to_all_peers
(msg_to_send=msg_to_send@entry=0x7fffc4283448,
ckpt_inst=ckpt_inst@entry=0xc12570,
mbc_inst=mbc_inst@entry=0xc12340) at mbcsv_util.c:486
#6 0x00007fd192ede4f7 in mbcsv_process_snd_ckpt_request (arg=0x7fffc4283440)
at mbcsv_api.c:820
#7 0x000000000040a970 in immd_mbcsv_sync_update (cb=cb@entry=0x629340
<_immd_cb>, msg=msg@entry=0x7fffc42834a0) at immd_mbcsv.c:59
#8 0x00000000004054e5 in immd_evt_proc_fevs_req (cb=cb@entry=0x629340
<_immd_cb>, evt=evt@entry=0x7fffc4283700, sinfo=sinfo@entry=0xc24c20,
deallocate=deallocate@entry=false) at immd_evt.c:283
#9 0x0000000000406a67 in immd_evt_proc_rt_modify_req (cb=cb@entry=0x629340
<_immd_cb>, evt=evt@entry=0xc24ae0, sinfo=sinfo@entry=0xc24c20) at
immd_evt.c:2325
#10 0x0000000000407ec3 in immd_process_evt () at immd_evt.c:151
#11 0x0000000000402621 in main (argc=<optimized out>, argv=<optimized out>) at
immd_main.c:291
At mbcsv_util.c:486 memory is freed after mds send failed. This is most likely
the problem, the memory has already been freed by mds.
MDS send fails due to timeout, why that happens is a dtm/mds issue. More of
that later.
This leads to active controller reboot and in worst case cluster restart.
This is probably because the legacy memory manager has been semi removed. Ref
counting of objects no longer works. This area of "base" needs a major cleanup
Trace from immd:
Jan 19 10:43:45.442248 osafimmd [400:immd_evt.c:0235] >> immd_evt_proc_fevs_req
Jan 19 10:43:45.442270 osafimmd [400:immd_evt.c:0271] T5 immd_evt_proc_fevs_req
send_count:625 size:111
Jan 19 10:43:45.442304 osafimmd [400:immd_mbcsv.c:0045] >>
immd_mbcsv_sync_update
Jan 19 10:43:45.442326 osafimmd [400:mbcsv_api.c:0773] >>
mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers,
as per the send-type specified
Jan 19 10:43:45.442346 osafimmd [400:mbcsv_api.c:0803] TR svc_id:42,
pwe_hdl:65549
Jan 19 10:43:45.442366 osafimmd [400:mbcsv_util.c:0344] >>
mbcsv_send_ckpt_data_to_all_peers
Jan 19 10:43:45.442385 osafimmd [400:mbcsv_util.c:0388] TR dispatching FSM for
NCSMBCSV_SEND_ASYNC_UPDATE
Jan 19 10:43:45.442404 osafimmd [400:mbcsv_act.c:0101] TR ASYNC update to be
sent. role: 1, svc_id: 42, pwe_hdl: 65549
Jan 19 10:43:45.442424 osafimmd [400:mbcsv_util.c:0400] TR calling encode
callback
Jan 19 10:43:45.442444 osafimmd [400:immd_mbcsv.c:0399] >> immd_mbcsv_callback
Jan 19 10:43:45.442463 osafimmd [400:immd_mbcsv.c:0790] >>
immd_mbcsv_encode_proc
Jan 19 10:43:45.442482 osafimmd [400:immd_mbcsv.c:0798] T5
MBCSV_MSG_ASYNC_UPDATE
Jan 19 10:43:45.442501 osafimmd [400:immd_mbcsv.c:0455] >>
mbcsv_enc_async_update
Jan 19 10:43:45.442519 osafimmd [400:immd_mbcsv.c:0463] T5 ************ENC SYNC
COUNT 300
Jan 19 10:43:45.442539 osafimmd [400:immd_mbcsv.c:0482] T5 ENCODE
IMMD_A2S_MSG_FEVS: send count: 625 handle: 85899477263
Jan 19 10:43:45.442564 osafimmd [400:immd_mbcsv.c:0605] <<
mbcsv_enc_async_update
Jan 19 10:43:45.442583 osafimmd [400:immd_mbcsv.c:0843] <<
immd_mbcsv_encode_proc
Jan 19 10:43:45.442602 osafimmd [400:immd_mbcsv.c:0428] T5 IMMD - MBCSv
Callback Success
Jan 19 10:43:45.442621 osafimmd [400:immd_mbcsv.c:0429] << immd_mbcsv_callback
Jan 19 10:43:45.442639 osafimmd [400:mbcsv_util.c:0439] TR send the encoded
message to any other peer with same s/w version
Jan 19 10:43:45.442658 osafimmd [400:mbcsv_util.c:0442] TR dispatching FSM for
NCSMBCSV_SEND_ASYNC_UPDATE
Jan 19 10:43:45.442676 osafimmd [400:mbcsv_act.c:0101] TR ASYNC update to be
sent. role: 1, svc_id: 42, pwe_hdl: 65549
Jan 19 10:43:45.442697 osafimmd [400:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg:
sending to vdest:d
Jan 19 10:43:45.442717 osafimmd [400:mbcsv_mds.c:0209] TR send type
MDS_SENDTYPE_REDRSP:
Jan 19 10:43:45.442739 osafimmd [400:mds_log.c:0192] TR INFO |MDS_SND_RCV:
creating sync entry with xch_id=289
Jan 19 10:43:45.442978 osafimmd [400:mds_log.c:0192] TR INFO |MDS_SND_RCV:
Msg Destination is on off node or diff process
Jan 19 10:43:45.443076 osafimmd [400:mds_log.c:0192] TR INFO |MDS_SND_RCV:
Sending the data to MDTM layer
Jan 19 10:43:45.443148 osafimmd [400:mds_log.c:0192] TR INFO |MDTM: User
Sending Data lenght=160 Fr_svc=19 to_svc=19
Jan 19 10:43:46.444955 osafimmd [400:mds_log.c:0192] TR ERR |MDS_SND_RCV:
Timeout or Error occured
Jan 19 10:43:46.445272 osafimmd [400:mds_log.c:0192] TR ERR |MDS_SND_RCV:
Timeout occured on red sndrsp message from svc_id=19, to svc_id=19
Jan 19 10:43:46.445360 osafimmd [400:mds_log.c:0192] TR ERR |MDS_SND_RCV:
Adest=<0x00000000,13>
Jan 19 10:43:46.445429 osafimmd [400:mds_log.c:0192] TR ERR |MDS_SND_RCV:
Anchor=<0x0002020f,397>
Jan 19 10:43:46.445501 osafimmd [400:mds_log.c:0192] TR INFO |MDS_SND_RCV:
Await active entry doesnt exists
Jan 19 10:43:46.445568 osafimmd [400:mds_log.c:0192] TR INFO |MDS_SND_RCV:
Deleting the sync send entry with xch_id=289
Jan 19 10:43:46.445634 osafimmd [400:mds_log.c:0192] TR INFO |MDS_SND_RCV:
Successfully Deleted the sync send entry with xch_id=289, fr_svc_id=19
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets