On 15/05/2026 08.47, xf han via sr-users wrote:
1. Since `tcpdump` sees the packets but Kamailio doesn't, is this a classic
case of OS-level UDP receive buffer (`rmem`) overflow
That is very likely the case, but you can confirm this by inspecting the
RX queue length of the affected sockets during peak times, e.g. with
`ss` or `netstat`. I believe there are also system-wide counters for
packet drops due to RX queue overflows.
Note that these conditions can be very short lived and might only be
apparent when you look at the queue length at the exact right moment.
possibly exacerbated by CPU context switching between Kamailio and RTPEngine?
Context switches are just another kind of CPU load, so would show up in
the regular CPU stats.
2. With `children=40` for Kamailio and RTPEngine also running on the same
56-core machine, is this configuration leading to significant CPU contention or
an imbalanced distribution of workload, especially at 3000 CC? Should
Kamailio's `children` count be adjusted (e.g., increased closer to 56, or
perhaps fewer to leave more for RTPEngine and OS network processing)?
CPU load distribution is certainly one aspect, but another aspect to
keep in mind is that each worker process/thread can only do one thing at
a time. That one thing might be crunching numbers on the CPU, but it can
also be just waiting for something. If every worker ends up waiting for
something, then the CPU would be idle, but no new request can be processed.
18 ms processing time per request means every worker can process 55
requests per second, and that's assuming this is evenly spread out
without any peaks. How does that compare to the CPS you're seeing
(keeping in mind that not every request is a new call)?
3. Does this scenario typically imply that the host CPU is
exhausted/interrupt-bound, or is there a specific Kamailio/OS tuning I am
missing?
CPU exhaustion would be quite obvious in the system stats. IRQ issues
can probably be discounted if the packets show up in tcpdump.
There are lots of other things that can block a process. I/O is a usual
suspect. Swapping due to memory pressure can be deadly, and logging is
another likely culprit. Depending on how logging is done, if the logging
system can't keep up, then processes might end up having to wait to
write their logs, blocking everything else in the meantime.
Communications to an external service (DB...) are another possible cause.
Cheers
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions --
[email protected]
To unsubscribe send an email to [email protected]
Important: keep the mailing list in the recipients, do not reply only to the
sender!