Hi everyone, this is only for the record, how I solve this problem. To try to understand the problems I wrote in the message bellow, I tested two simultaneous sipp processes trying to see different behavior, and I also downgrade system and kernel only to check if was not something related to the one I was using (I read something related to ksoftirqd in centos 7).
The behavior was similar to the one I described in all tests, when calls reached about 300-400, ksoftirqd gets 100% in the cpu dedicated to the NIC irqs (maybe a coincidence) on both machines (sender and receiver) and then sipp calls starts to fail, starts timeouts and re- transmissions. After that I tested only sending RTP in one call leg and comment out pcap file sending from the other party, it works, only failed when both send RTP at same time. So I suspected this problem was related with using pacap files and maybe with the pcap files I was using. Another issue I spotted, when using the same pacap in both sender and receiver, was the sequence number and SSRC in the RTP datagrams where the same because this method simply mirror the ones present int the pcap file and then wireshark discards one call leg because of that. To solve this minor issue, I'm not sure if RFC specify this behavior but I heard it is, I was forced to use two different pcap files, one for sender and another one for receiver. I also tested receiver in echo mode and this way it worked, but because echo mode simply mirror RTP back to the sender, I also had the duplicated SSRC problem. Since I wasn't able put to work my test environment using pcap files, I changed the method and test it using PCMA wav files using rtp_stream. The CPU problem disappeared and I noticed two minor issues that I'm not sure if it is according RFC, or not, because I had not time to read it. The issues was, the first one was related to the same SSRC in RTP datagrams generated by both sides (sender and receiver) and the second one was that both sender and receiver do not use as source port to RTP datagrams the port they announce in SDP, both starts using a source port 8192 and then increase it each new call. The first issue, as I mentioned above, causes wireshark to discard one call leg, so, to solve it I changed the code (rtpstream.cpp) in one side (receiver for example) for that side starts with a different SSRC. Now wireshark already consider both RTP legs belonging to the same call. It is important to point out that I didn't capture all calls, I only captured two or three example calls to debug and I noticed this two issues doing that. For the second one, I didn't found in the code the right place to change it, and because I wasn't spotted any true problems related to this issue, I left it as it was, but maybe it is a good future work to fix this. In te tests I made, using play_pcap_audio I reach the maximum of 250- 400 calls before starting getting errors (timeouts and dead calls). As I mentioned, this behavior may be related to the pcap files I'm using or something with the code itself. Using the same machines with rtp_stream, I got 2500 with some retransmissions but without errors using a csv file in sequencial mode and 2200 using the same file in random mode. More than that, I'm starting getting errors, but this i more than enough for my tests. Regards, PF On Qui, 2016-03-17 at 18:09 +0000, Paulo Ferreira wrote: > Hi everyone, > I'm trying to create a testbed to test concurrent calls in some > different iPBXs. The testbed consists in a UAC and a UAS and they try > to start a maximum of 500 calls through a iPBX (send INVITE and then > after provision messages and OK, UAC pause for 9 minutes and both > exchange rtp with each other, after the pause, UAC start ending with > BYE the open calls). > > The problem is I can't reach 500 concurrent calls in UAC/UAS directly > connected without iPBX in the middle. > > Everything goes ok when starting the calls between UAC and UAS, but > when the number of calls reach around 400 calls (more or less), a > process called ksoftirqd (runs when the machine is under heavy soft- > interrupt load) start increasing, on both UAC and UAS, in one CPU and > reach 100% of that CPU. This cause the UAC and the UAS to start re- > transmissions, then comes the timeouts and unexpected messages (in > attachment I send an error trace from UAC). > > After I checked interrupts, I noticed the biggest requester is the > NIC, > so I instructs kernel to distribute that IRQ requests for more than > one > CPU trying to reduce the behavior. When I start the test after this > change, everything goes as expected, I see three ksoftirqd in the > cpus > I defined, but when reaching about 400 calls, one random ksoftirqd of > the three, reach 100% and the problems starts again. > > Machine loads stay low because this behavior only occurs in the > defined > CPUs from the 8 the machines have and they have plenty of free > memory. > > Before I noticed the real problem I tried to change timeout and re- > transmissions timers, and other specs, without any success. > > I already tested this behavior in machines with different specs (the > first two where weaker the the last ones) with CentOS Linux release > 7.2.1511 with 3.10.0-327.10.1.el7.x86_64 kernel. > > I tested this in sipp versions 3.5.0 and 3.5.1. > > Anyone already faced such behavior? > > Regards, > Paulo > ------------------------------------------------------------------- > ----------- > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > _______________________________________________ > Sipp-users mailing list > Sipp-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/sipp-users -- ------------------------------------------- Paulo Ferreira VoIP@RCTS - Área de Infraestruturas Aplicacionais FCCN http://www.fccn.pt/ Av. do Brasil, n.º 101 1700-066 Lisboa - Portugal Telefone|Phone: +351 218440100; Fax: +351 218472167 Aviso de Confidencialidade/Disclaimer Esta mensagem é exclusivamente destinada ao seu destinatário, podendo conter informação CONFIDENCIAL, cuja divulgação está expressamente vedada nos termos da lei. Caso tenha rececionado indevidamente esta mensagem,solicitamos-lhe que nos comunique esse mesmo facto por esta via ou para o telefone +351 218440100 devendo apagar o seu conteúdo de imediato. This message is intended exclusively for its addressee. It may contain CONFIDENTIAL information protected by law. If this message has been received by error, please notify us via e-mail or by telephone +351 218440100 and delete it immediately.
signature.asc
Description: This is a digitally signed message part
------------------------------------------------------------------------------ Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________ Sipp-users mailing list Sipp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sipp-users