Hi Pekka and all, We are just having an issue with sofia-sip in a multi-threaded environment. Not sure if this is an error in sofia-sip or an error in our multi-threading model, but anyway, I would like to know your opinion, and see if my suggestion on how to solve this makes much sense. As background, we receive all SIP messages in a "listener" thread A, but the messages get processed by different "worker" threads (B,C..). Of course, we try to make sure that the root object of the NTA is properly acquired/released when we operate on the nta_agent_t.
The issue is the following: [t] We create an INVITE outgoing transaction and we send it to the UAS. [t+3s] We receive 100 Trying in thread A [t+4s] We receive 200 OK in thread A, and the callback set in orq->orq_callback is called. We start processing the 200 OK reply in a DIFFERENT thread B (not the one running the nta_agent). [t+7s] We receive a new retransmitted 200 OK in thread A. The processing of the first 200 OK is still not finished by thread B, so orq->orq_completed is still not TRUE. Thus, outgoing_duplicate() is never called in outgoing_recv() as this is not treated as a retransmission yet. The problem now is that as this is not treated as retransmission, before re-calling orq->orq_callback(), the previous response msg is destroyed with msg_destroy(): /* Previous orq response is destroyed */ if (orq->orq_response) msg_destroy(orq->orq_response); /* New orq response is set */ orq->orq_response = msg; /* Call callback */ orq->orq_callback(orq->orq_magic, orq, sip); [t+8s] We finished processing the first 200 OK in thread B, and we want to generate the ACK. BUT, the sip_t we received in the callback is NO LONGER valid, as it was generated from the first msg_t (which was destroyed just after the second 200 OK arrived). Thus, we end up having lots of invalid reads reported by valgrind, and potentially a segfault. Of course, in a single-threaded application this would not happen, as the reception of the retransmitted 200 OK would have been done always after having fully processed the first 200 OK. Now, in order to avoid this, the idea is to just make sure that at least one reference of the msg_t is available while we process the reply in thread B. This could be managed if inside our callback stored in orq->orq_callback we could call msg_ref_create() and call ourselves msg_destroy() when we no longer need to process the associated sip_t. The steps would be: * We receive the SIP response in the listener thread A. Sofia-SIP calls our nta_response_f callback stored in orq->orq_callback. * Our nta_response_f makes sure a new reference to the msg_t is obtained. We don't have a pointer to the msg_t, but we have the orq and we can get the msg_t from the orq: /* get msg_t from the orq */ msg_t *msg = nta_outgoing_getresponse (orq); /* new reference in the msg_t */ msg_ref_create (msg); /* Now, we 'forward' the reply to one of the workers as before... */ * In the worker thread, when we finished processing the response, we just make sure our reference is unref-ed: /* get msg_t from the orq */ msg_t *msg = nta_outgoing_getresponse (orq); /* destroy our reference */ msg_destroy(msg); Of course, with this solution we just avoid the invalid reads in valgrind, but, still we will see that retransmissions arrive to the nta_agent_t and they are not treated as retransmissions, as we still didn't 'complete' the outgoing request in thread B while received the new message in thread A. Any possible way of avoiding this? Sorry for the long email, btw. Cheers, -Aleksander ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Sofia-sip-devel mailing list Sofia-sip-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sofia-sip-devel