On 15/05/2026 08.47, xf han via sr-users wrote:
1.  Since `tcpdump` sees the packets but Kamailio doesn't, is this a classic 
case of OS-level UDP receive buffer (`rmem`) overflow

That is very likely the case, but you can confirm this by inspecting the RX queue length of the affected sockets during peak times, e.g. with `ss` or `netstat`. I believe there are also system-wide counters for packet drops due to RX queue overflows.

Note that these conditions can be very short lived and might only be apparent when you look at the queue length at the exact right moment.

possibly exacerbated by CPU context switching between Kamailio and RTPEngine?
Context switches are just another kind of CPU load, so would show up in the regular CPU stats.
2.  With `children=40` for Kamailio and RTPEngine also running on the same 
56-core machine, is this configuration leading to significant CPU contention or 
an imbalanced distribution of workload, especially at 3000 CC? Should 
Kamailio's `children` count be adjusted (e.g., increased closer to 56, or 
perhaps fewer to leave more for RTPEngine and OS network processing)?

CPU load distribution is certainly one aspect, but another aspect to keep in mind is that each worker process/thread can only do one thing at a time. That one thing might be crunching numbers on the CPU, but it can also be just waiting for something. If every worker ends up waiting for something, then the CPU would be idle, but no new request can be processed.

18 ms processing time per request means every worker can process 55 requests per second, and that's assuming this is evenly spread out without any peaks. How does that compare to the CPS you're seeing (keeping in mind that not every request is a new call)?

3.  Does this scenario typically imply that the host CPU is 
exhausted/interrupt-bound, or is there a specific Kamailio/OS tuning I am 
missing?

CPU exhaustion would be quite obvious in the system stats. IRQ issues can probably be discounted if the packets show up in tcpdump.

There are lots of other things that can block a process. I/O is a usual suspect. Swapping due to memory pressure can be deadly, and logging is another likely culprit. Depending on how logging is done, if the logging system can't keep up, then processes might end up having to wait to write their logs, blocking everything else in the meantime.

Communications to an external service (DB...) are another possible cause.

Cheers
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions -- 
[email protected]
To unsubscribe send an email to [email protected]
Important: keep the mailing list in the recipients, do not reply only to the 
sender!

Reply via email to