Done! This bug is affecting trunk only.
Regards, Bogdan Bogdan-Andrei Iancu wrote: > Hi Kennard, > > You are perfectly right - there was a change in moving the setT(cell) > upper in the function (we needed it in the callbacks), and this bug > looks like a side effect. > > I will make the fix asap ! > > Thanks and regards, > Bogdan > > [email protected] wrote: > >> Hi Bogdan, >> >> I believe I found the problem. When sip_msg_cloner() within >> build_cell() fails due to out-of-mem, and dangling pointer to the cell >> is left in the global transaction pointer. Later on the post_cb() code >> attempts to clean this up, and "resurrects" the now-free memory, and >> in particular puts it on a wait timer. My guess is that later on this >> memory is allocated into a new transaction which eventually uses the >> same wait timer link, and the two threads fight it out. >> >> Adding a set_t(0) fixes the problem. While looking, I believe I found >> a memory leak in uac creation code that is also triggered by out of >> memory conditions. Please see attached patch. >> >> The "problem" also manifest itself as a core dump with >> insert_timer_unsafe. Once I disabled optimization, I saw multiple >> variations within this function. The most common was that ptr==tl >> after the search. The sequence of events for this is: new cell >> created, stored into global T, free'd, added to wait-list by cleanup >> code from global T, memory re-allocated into new cell, timer link >> zero'd, and then added again to wait list. At least that is my best guess. >> >> Regards, >> Kennard >> >> /(See attached file: opensips-tm-cell.patch)/ >> >> Inactive hide details for Bogdan-Andrei Iancu ---10/22/2010 07:02:36 >> AM---Hi Kennard, I suppose the bt is the same ? do you >> stiBogdan-Andrei Iancu ---10/22/2010 07:02:36 AM---Hi Kennard, I >> suppose the bt is the same ? do you still have the core file ? >> >> From: Bogdan-Andrei Iancu <[email protected]> >> To: OpenSIPS users mailling list <[email protected]> >> Date: 10/22/2010 07:02 AM >> Subject: Re: [OpenSIPS-Users] opensips tm timer core dump >> Sent by: [email protected] >> >> ------------------------------------------------------------------------ >> >> >> >> Hi Kennard, >> >> I suppose the bt is the same ? do you still have the core file ? >> >> Regards, >> Bogdan >> >> [email protected] wrote: >> >>> Hi Bodgen, >>> >>> I replicated the error. Unfortunately the entire insert_timer_unsafe >>> and been in-lined and little is available: >>> >>> Program terminated with signal 11, Segmentation fault. >>> #0 0x00007f8b8356c2c2 in insert_timer_unsafe (new_tl=0x7f8b7a54e310, >>> list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at timer.c:731 >>> 731 timer.c: No such file or directory. >>> in timer.c >>> (gdb) print tl >>> $1 = <value optimized out> >>> (gdb) print *tl >>> Cannot access memory at address 0x0 >>> (gdb) print ptr >>> $2 = <value optimized out> >>> (gdb) print *ptr >>> Cannot access memory at address 0x0 >>> (gdb) print *new_tl >>> No symbol "new_tl" in current context. >>> (gdb) up >>> #1 set_1timer (new_tl=0x7f8b7a54e310, list_id=WT_TIMER_LIST, >>> ext_timeout=<value optimized out>) at timer.c:904 >>> 904 in timer.c >>> (gdb) print *new_tl >>> $3 = {next_tl = 0x0, prev_tl = 0x0, ld_tl = 0x0, time_out = 0, >>> timer_list = 0x0, deleted = 0} >>> (gdb) print list >>> $4 = <value optimized out> >>> (gdb) print timeout >>> $5 = 32 >>> (gdb) print new_tl >>> $6 = (struct timer_link *) 0x7f8b7a54e310 >>> >>> I'll keep the core for a while -- please let me know if there is >>> anything else I can try. >>> >>> Thanks, >>> Kennard >>> >>> Inactive hide details for Bogdan-Andrei Iancu ---10/08/2010 04:40:47 >>> AM---Hi Kennard, Ok, keep the core next time :)Bogdan-Andrei Iancu >>> ---10/08/2010 04:40:47 AM---Hi Kennard, Ok, keep the core next time :) >>> >>> From: Bogdan-Andrei Iancu <[email protected]> >>> To: OpenSIPS users mailling list <[email protected]> >>> Date: 10/08/2010 04:40 AM >>> Subject: Re: [OpenSIPS-Users] opensips tm timer core dump >>> Sent by: [email protected] >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> Hi Kennard, >>> >>> Ok, keep the core next time :) >>> >>> Regards, >>> Bogdan >>> >>> [email protected] wrote: >>> >>>> Hi Bogden, >>>> >>>> Thanks for explaining the child processes involved -- I misunderstood >>>> what was happening. >>>> >>>> Unfortunately, I don't have the core anymore. My recollection is that >>>> I couldn't print anything useful due to compiler optimization. That >>>> said, this should re-create pretty easily, and I'll get more dumps >>>> next time it happens. >>>> >>>> Regards, >>>> Kennard >>>> >>>> Inactive hide details for Bogdan-Andrei Iancu ---10/05/2010 01:41:38 >>>> AM---Hi Kennard, The core was generated by process 22255:Bogdan-Andrei >>>> Iancu ---10/05/2010 01:41:38 AM---Hi Kennard, The core was generated >>>> by process 22255: >>>> >>>> From: Bogdan-Andrei Iancu <[email protected]> >>>> To: OpenSIPS users mailling list <[email protected]> >>>> Date: 10/05/2010 01:41 AM >>>> Subject: Re: [OpenSIPS-Users] opensips tm timer core dump >>>> Sent by: [email protected] >>>> >>>> >>>> >> ------------------------------------------------------------------------ >> >>>> >>>> Hi Kennard, >>>> >>>> The core was generated by process 22255: >>>> [22238]: INFO:core:handle_sigs: child process 22255 exited by a >>>> signal 11 >>>> >>>> and this process also reported mem problems: >>>> [22255]: ERROR:tm:new_t: out of mem >>>> >>>> Can you print the "tl" or "ptr" variables in frame 0? >>>> >>>> Regards, >>>> Bogdan >>>> >>>> [email protected] wrote: >>>> >>>>> Running against opensips HEAD, I got a segfault in the tm timer >>>>> >> code. >> >>>>> I believe this is triggered by running out of shared memory. >>>>> >>>>> >>>>> The stack trace: >>>>> >>>>> (gdb) where >>>>> #0 0x00007fe8f8d96212 in insert_timer_unsafe (new_tl=0x7fe8f66337b0, >>>>> list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at >>>>> >>> timer.c:731 >>> >>>>> #1 set_1timer (new_tl=0x7fe8f66337b0, list_id=WT_TIMER_LIST, >>>>> ext_timeout=<value optimized out>) at timer.c:904 >>>>> #2 0x00007fe8f8d78ac8 in t_release_transaction >>>>> >> (trans=0x7fe8f6633730) >> >>>>> at t_funcs.c:122 >>>>> #3 0x00007fe8f8d808e5 in t_unref (p_msg=<value optimized out>) >>>>> at t_lookup.c:1152 >>>>> #4 0x0000000000483ae5 in exec_post_req_cb () >>>>> #5 0x000000000046c1e4 in receive_msg () >>>>> #6 0x00000000004bc77c in udp_rcv_loop () >>>>> #7 0x000000000042de9c in main () >>>>> >>>>> The offending code (I believe): >>>>> if (tl->time_out==ptr->time_out) { >>>>> tl->ld_tl = ptr->ld_tl >>>>> ptr->ld_tl = 0; >>>>> tl->ld_tl->ld_tl = tl; <-- SEG FAULT HERE (according to trace) >>>>> } else { >>>>> tl->ld_tl = tl; >>>>> } >>>>> >>>>> Unfortunately, due to optimization I cannot dump anything >>>>> >> useful, and >> >>>>> I'm not convinced the actual fault is on the line indicated. >>>>> >> Note that >> >>>>> the core dump is not one of the processes that reported out of >>>>> >> memory. >> >>>>> Maybe one of the other processes left the timer list in a corrupt >>>>> >>> state? >>> >>>>> The log file: >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]: >>>>> ERROR:tm:sip_msg_cloner: no more share memory >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]: >>>>> ERROR:tm:new_t: out of mem >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]: >>>>> ERROR:tm:t_newtran: new_t failed >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: >>>>> WARNING:core:fm_malloc: Not enough free memory, will atempt >>>>> >>>> defragmenation >>>> >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: >>>>> ERROR:tm:sip_msg_cloner: no more share memory >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: >>>>> ERROR:tm:new_t: out of mem >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]: >>>>> ERROR:tm:t_newtran: new_t failed >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]: >>>>> INFO:core:handle_sigs: child process 22255 exited by a signal 11 >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]: >>>>> INFO:core:handle_sigs: core was generated >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]: >>>>> INFO:core:handle_sigs: terminating due to SIGCHLD >>>>> Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22256]: >>>>> INFO:core:sig_usr: signal 15 received >>>>> >>>>> >>>>> >>> ------------------------------------------------------------------------ >>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> [email protected] >>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>>> >>>>> >>>> -- >>>> Bogdan-Andrei Iancu >>>> OpenSIPS Bootcamp >>>> 15 - 19 November 2010, Edison, New Jersey, USA >>>> www.voice-system.ro >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>> >>>> >>>> >> ------------------------------------------------------------------------ >> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>>> >>>> >>> -- >>> Bogdan-Andrei Iancu >>> OpenSIPS Bootcamp >>> 15 - 19 November 2010, Edison, New Jersey, USA >>> www.voice-system.ro >>> >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >>> >>> >> -- >> Bogdan-Andrei Iancu >> OpenSIPS Bootcamp >> 15 - 19 November 2010, Edison, New Jersey, USA >> www.voice-system.ro >> >> >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.opensips.org/cgi-bin/mailman/listinfo/users >> >> > > > -- Bogdan-Andrei Iancu OpenSIPS Bootcamp 15 - 19 November 2010, Edison, New Jersey, USA www.voice-system.ro _______________________________________________ Users mailing list [email protected] http://lists.opensips.org/cgi-bin/mailman/listinfo/users
