Re: [SR-Users] tm.so --> segfault at 3135352e36 ip 00007f761bb57ed1 sp 00007fff9db8b1c0 error 4 in tm.so

Richard Fuchs Mon, 25 Feb 2019 10:07:05 -0800

On 25/02/2019 12.34, Daniel-Constantin Mierla wrote:

Hello,
that's strange, but a while ago someone else reported an issue withsame backtrace.
So the crash happens at the last line in the next snippet fromreply_received() function in the tm module:
    uac=&t->uac[branch];
    LM_DBG("org. status uas=%d, uac[%d]=%d local=%d is_invite=%d)\n",
        t->uas.status, branch, uac->last_received,
        is_local(t), is_invite(t));
    last_uac_status=uac->last_received;
The backtrace and info locals say that uac is null (0x0). According tomy knowledge, the address of a field in a structure cannot be null anduac is set to &t->uac[branch]. Moreover, uac->last_received is printedin the LM_DBG() above the line of crash, if uac was 0x0, the crashshould have happened there.

t->uac is a pointer to an array, not a static array contained in thestruct. So, if t->uac was null, then &t->uac[branch] would also yieldnull if branch was zero. (For a non-zero branch, it would yield apointer to somewhere just past null. &t->uac[branch] is the same ast->uac + branch.)

As for LM_DBG, I'm not too familiar with the logging macros, but ifthey're defined in such a way to check the log level first and then skipcalling the actual logging function if the log level is too low, thenthe LM_DBG arguments would never be evaluated and so no null dereferencewould occur there.

I was debugging a similar core dump just the other day, although in adifferent location. That one was in t_should_relay_response(), line1282, and also had Trans->uac == null. The strange part about this onewas that according to gdb, Trans->uac was valid:

#0 0x00007f3f11d5b5e8 in t_should_relay_response(Trans=Trans@entry=0x7f3e14a551f8, new_code=new_code@entry=200, branch=branch@entry=0,should_store=should_store@entry=0x7fffb0353408,should_relay=should_relay@entry=0x7fffb0353404, cancel_data=cancel_data@entry=0x7fffb0353670, reply=0x7f3f160aa6e8)at t_reply.c:1282

1282 in t_reply.c
(gdb) p Trans->uac[branch].last_received
$11 = 0

even though the asm instruction definitely was a null dereference into->uac:


     0x00007f3f11d5b5de <+718>: add 0x170(%rbx),%r8
=> 0x00007f3f11d5b5e8 <+728>: mov 0x190(%r8),%eax
(gdb) p $r8
$2 = 0

%rbx had Trans and so %r8 had Trans->uac. At this point, %8 ==Trans->uac == null, even though:


(gdb) p (long int) Trans->uac
$18 = 139904611079176

Investigating further, we found that Trans resided in shared memory andso we (tentatively) concluded that this looks to be a race conditionwith another process overwriting the Trans shm. First Trans->uac wasnull and got assigned to %r8, then another process changed it tosomething valid in shm, then the segfault happened through %r8. Wedidn't have a chance to investigate further and I can't say for sure ifthese two crashes are related.


Cheers


_______________________________________________
Kamailio (SER) - Users Mailing List
[email protected]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

Re: [SR-Users] tm.so --> segfault at 3135352e36 ip 00007f761bb57ed1 sp 00007fff9db8b1c0 error 4 in tm.so

Reply via email to