Hi everyone,
this is only for the record, how I solve this problem.

To try to understand the problems I wrote in the message bellow, I
tested two simultaneous sipp processes trying to see different
behavior, and I also downgrade system and kernel only to check if was
not something related to the one I was using (I read something related
to ksoftirqd in centos 7).

The behavior was similar to the one I described in all tests, when
calls reached about 300-400,  ksoftirqd gets 100% in the cpu dedicated
to the NIC irqs (maybe a coincidence) on both machines (sender and
receiver) and then sipp calls starts to fail, starts timeouts and re-
transmissions.

After that I tested only sending RTP in one call leg and comment out
pcap file sending from the other party, it works, only failed when both
send RTP at same time. So I suspected this problem was related with
using pacap files and maybe with the pcap files I was using.

Another issue I spotted, when using the same pacap in both sender and
receiver, was the sequence number and SSRC in the RTP datagrams where
the same because this method simply mirror the ones present int the
pcap file and then wireshark discards one call leg because of that. To
solve this minor issue, I'm not sure if RFC specify this behavior but I
heard it is, I was forced to use two different pcap files, one for
sender and another one for receiver.

I also tested receiver in echo mode and this way it worked, but because
echo mode simply mirror RTP back to the sender, I also had the
duplicated SSRC problem.

Since I wasn't able put to work my test environment using pcap files, I
changed the method and test it using PCMA wav files using rtp_stream.
The CPU problem disappeared and I noticed two minor issues that I'm not
sure if it is according RFC, or not, because I had not time to read it.
The issues was, the first one was related to the same SSRC in RTP
datagrams generated by both sides (sender and receiver) and the second
one was that both sender and receiver do not use as source port to RTP
datagrams the port they announce in SDP, both starts using a source
port 8192 and then increase it each new call. 

The first issue, as I mentioned above, causes wireshark to discard one
call leg, so, to solve it I changed the code (rtpstream.cpp) in one
side (receiver for example) for that side starts with a different SSRC.
Now wireshark already consider both RTP legs belonging to the same
call. It is important to point out that I didn't capture all calls, I
only captured two or three example calls to debug and I noticed this
two issues doing that.
 
For the second one, I didn't found in the code the right place to
change it, and because I wasn't spotted any true problems related to
this issue, I left it as it was, but maybe it is a good future work to
fix this.

In te tests I made, using play_pcap_audio I reach the maximum of 250-
400 calls before starting getting errors (timeouts and dead calls). As
I mentioned, this behavior may be related to the pcap files I'm using
or something with the code itself. Using the same machines with
rtp_stream, I got 2500 with some retransmissions but without errors
using a csv file in sequencial mode and 2200 using the same file in
random mode. More than that, I'm starting getting errors, but this i
more than enough for my tests. 

Regards,
PF

On Qui, 2016-03-17 at 18:09 +0000, Paulo Ferreira wrote:
> Hi everyone,
> I'm trying to create a testbed to test concurrent calls in some
> different iPBXs. The testbed consists in a UAC and a UAS and they try
> to start a maximum of 500 calls through a iPBX (send INVITE and then
> after provision messages and OK, UAC pause for 9 minutes and both
> exchange rtp with each other, after the pause, UAC start ending with
> BYE the open calls).
> 
> The problem is I can't reach 500 concurrent calls in UAC/UAS directly
> connected without iPBX in the middle.
> 
> Everything goes ok when starting the calls between UAC and UAS, but
> when the number of calls reach around 400 calls (more or less), a
> process called ksoftirqd (runs when the machine is under heavy soft-
> interrupt load) start increasing, on both UAC and UAS, in one CPU and
> reach 100% of that CPU. This cause the UAC and the UAS to start re-
> transmissions, then comes the timeouts and unexpected messages (in
> attachment I send an error trace from UAC).
> 
> After I checked interrupts, I noticed the biggest requester is the
> NIC,
> so I instructs kernel to distribute that IRQ requests for more than
> one
> CPU trying to reduce the behavior. When I start the test after this
> change, everything goes as expected, I see three ksoftirqd in the
> cpus
> I defined, but when reaching about 400 calls, one random ksoftirqd of
> the three, reach 100% and the problems starts again.
> 
> Machine loads stay low because this behavior only occurs in the
> defined
> CPUs from the 8 the machines have and they have plenty of free
> memory.
> 
> Before I noticed the real problem I tried to change timeout and re-
> transmissions timers, and other specs, without any success.
> 
> I already tested this behavior in machines with different specs (the
> first two where weaker the the last ones) with CentOS Linux release
> 7.2.1511 with 3.10.0-327.10.1.el7.x86_64 kernel.
> 
> I tested this in sipp versions 3.5.0 and 3.5.1.
> 
> Anyone already faced such behavior?
> 
> Regards,
> Paulo
> -------------------------------------------------------------------
> -----------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> Sipp-users mailing list
> Sipp-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sipp-users
-- 
-------------------------------------------
Paulo Ferreira
VoIP@RCTS -
Área de Infraestruturas Aplicacionais
FCCN
http://www.fccn.pt/
Av. do
Brasil, n.º 101
1700-066 Lisboa - Portugal
Telefone|Phone: +351
218440100; Fax: +351 218472167

 
Aviso de Confidencialidade/Disclaimer
Esta mensagem é exclusivamente destinada ao seu destinatário, 
podendo conter informação CONFIDENCIAL, cuja divulgação está
expressamente vedada nos termos da lei. Caso tenha rececionado
indevidamente esta mensagem,solicitamos-lhe que nos comunique 
esse mesmo facto por esta via ou para o telefone +351 218440100
devendo apagar o seu conteúdo de imediato. This message is 
intended exclusively for its addressee. It may contain 
CONFIDENTIAL information protected by law. If this message has 
been received by error, please notify us via e-mail or by 
telephone +351 218440100 and delete it immediately.

Attachment: signature.asc
Description: This is a digitally signed message part

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Sipp-users mailing list
Sipp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sipp-users

Reply via email to