abalashov created an issue (kamailio/kamailio#4603)

The documentation for `evapi_relay()` says: 

> The function is passing the task to evapi dispatcher process, therefore the 
> SIP worker process is not blocked

However, I am not sure this is actually true. The client sockets on the EVAPI 
dispatcher are never set to nonblocking. I have encountered instances in 
production where an EVAPI consumer does not read fast enough and we get a total 
SIP worker stall. While it is true that the TCP sending is done by the EVAPI 
dispatcher process, this eventually leads to backpressure on the `socketpair` 
pipe from the SIP worker process to the EVAPI dispatcher, and, in due time, 
seems to stall the SIP worker.

As far as I can tell, the main problem, as I mentioned, is that client sockets 
connected to the dispatcher aren't set `O_NONBLOCK` -- only the server 
listening socket is. Accordingly, `write()`s to client sockets in 
`evapi_dispatch_notify()` are, in fact, blocking. Is that right?

The secondary problem is that the notify `socketpair` between the worker and 
the dispatcher process is also blocking, and so of course would have a finite 
send buffer (what is it, 200-something kB on Linux?). Once that buffer is full, 
if the dispatcher is stalled waiting on a blocking `write()` to a client, then 
the worker's `write()` to the dispatcher end of the pipe (i.e. 
`evapi_notify_sockets[1]`) will block, too.

If my understanding is correct, then the claim made above, that "the SIP worker 
process is not blocked", represents a happy path through the code where the 
dispatcher is healthy and can consume the pipe from a SIP worker fast enough. 
Given a slow client, that may not be true. If so, I guess the suggested fix is 
to make both the dispatcher pipe and the client socket writes nonblocking, 
though I am not a competent judge of how to harness that into `libev` and so 
forth. Client writes would need the usual `EAGAIN` handling and per-client 
output queues, or some policy of dropping overflowing messages, and I'm not 
sure how to best design that.

This issue is admittedly somewhat complex to reproduce, and I don't have a back 
trace or other artifacts handy. It does not seem confined to a historical 
version of Kamailio, and I analysed this issue against 6.1. 


-- 
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/4603
You are receiving this because you are subscribed to this thread.

Message ID: <kamailio/kamailio/issues/[email protected]>
_______________________________________________
Kamailio - Development Mailing List -- [email protected]
To unsubscribe send an email to [email protected]
Important: keep the mailing list in the recipients, do not reply only to the 
sender!

Reply via email to