Re: [Project Clearwater] Performance limit measurement
Hi Michael, It’s worth noting that Project Clearwater is designed to scale horizontally rather than vertically, so we would expect multiple less powerful Sprout nodes to out-perform a single powerful Sprout node. However, that doesn’t mean that your Sprout node isn’t capable of handling the load you’re hitting it with. We do expose latency measurements over SNMP – see http://clearwater.readthedocs.io/en/stable/Clearwater_SNMP_Statistics.html for more details. In particular, under Sprout statistics we have various latency statistics including latency for SIP requests and latency for requests to Homestead. There are a couple of other statistics that might be useful for determining when exactly our requests are failing – if the number of initial registration failures and/or the number of authentication failures are non-zero this would indicate that the bottleneck is actually at Homestead. It does sound as though Sprout is reporting itself as overloaded even though it could handle more requests. As I mentioned previously, Sprout will tweak it’s overload controls to rectify this, but it won’t be immediate, which might explain the failures. I know you’ve already tried tweaking the token controls, but it might be worth looking at them again. Over the course of the minute of your test, I think we expect to receive 60,000 REGISTERs (2 per subscriber), and they should be even distributed, so we’re expecting 1,000 requests per second. Have you tried setting init_token_rate to 1,000? You’ll want to make sure this change is picked up on both Sprout and Homestead – you can do this by editing /etc/clearwater/shared_config on a single node and running /usr/share/clearwater/clearwater-config-manager/scripts/upload_shared_config. After a few minutes the change will have propagated around the deployment. Thanks, Graeme From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] On Behalf Of ??? ?ats? Sent: 19 September 2016 08:42 To: clearwater@lists.projectclearwater.org Subject: Re: [Project Clearwater] Performance limit measurement Hi Graeme, i created a simpler scenario comparing to what the Sip Stress testing uses. In each scenario two subscribers try just to register to IMS and do not make any call to each other. I run this scenario for 15000 pairs of subscribers (3 subscribers). The register requests are distributed in 1 minute time. It seems tha Sprout node is the bottleneck. The return code of most of the failed messages is 503 (Service Unavailable) and of some of them 408 (Request Timeout). I have added resources in Sprout (4 CPUs and 8Gb memory) so i don't believe that resources is the issue. Does Sprout somehow exposes the latency measurements that lead to the throttling? We would like to take a look at them. Here is the the xml file . Best Regards, Michael Katsoulis 2016-09-16 21:25 GMT+03:00 Graeme Robertson (projectclearwater.org<http://projectclearwater.org>) mailto:gra...@projectclearwater.org>>: Hi Michael, Can you tell me more about your scenario? It sounds like you’re not using the clearwater-sip-stress package, or at least not in exactly the form we package up. If you’re not using the clearwater-sip-stress package then please can you send details of your stress scenario? Depending on how powerful your Sprout node is, I would expect 15000 calls per second to be towards the upper limit of its performance powers. However, if the CPU is not particularly high then that would suggest that Sprout’s throttling controls might require further tuning. Do you know what return code the “unexpected messages” have? 503s indicate that there is overload somewhere. Sprout does adjust its throttling controls to match the load its able to process, but that process is not immediate, and we recommend building stress up gradually rather than immediately firing 15000 calls per second into the system – for more information on that, see http://www.projectclearwater.org/clearwater-performance-and-our-load-monitor/. One final thought I had was that the node you’re running stress on might be overloaded. If the stress node is not responding to messages in a timely fashion then that will generate time outs and unexpected messages. Thanks, Graeme From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org<mailto:clearwater-boun...@lists.projectclearwater.org>] On Behalf Of ??? ?ats? Sent: 16 September 2016 15:16 To: clearwater@lists.projectclearwater.org<mailto:clearwater@lists.projectclearwater.org> Subject: Re: [Project Clearwater] Performance limit measurement Hi Graeme, thanks a lot for your response. In our scenario we are using the Stress node to genera
Re: [Project Clearwater] Performance limit measurement
Hi Graeme, i created a simpler scenario comparing to what the Sip Stress testing uses. In each scenario two subscribers try just to register to IMS and do not make any call to each other. I run this scenario for 15000 pairs of subscribers (3 subscribers). The register requests are distributed in 1 minute time. It seems tha Sprout node is the bottleneck. The return code of most of the failed messages is 503 (Service Unavailable) and of some of them 408 (Request Timeout). I have added resources in Sprout (4 CPUs and 8Gb memory) so i don't believe that resources is the issue. Does Sprout somehow exposes the latency measurements that lead to the throttling? We would like to take a look at them. *Here is the the xml file* . Best Regards, Michael Katsoulis 2016-09-16 21:25 GMT+03:00 Graeme Robertson (projectclearwater.org) < gra...@projectclearwater.org>: > Hi Michael, > > > > Can you tell me more about your scenario? It sounds like you’re not using > the clearwater-sip-stress package, or at least not in exactly the form we > package up. If you’re not using the clearwater-sip-stress package then > please can you send details of your stress scenario? > > > > Depending on how powerful your Sprout node is, I would expect 15000 calls > per second to be towards the upper limit of its performance powers. > However, if the CPU is not particularly high then that would suggest that > Sprout’s throttling controls might require further tuning. Do you know what > return code the “unexpected messages” have? 503s indicate that there is > overload somewhere. Sprout does adjust its throttling controls to match the > load its able to process, but that process is not immediate, and we > recommend building stress up gradually rather than immediately firing 15000 > calls per second into the system – for more information on that, see > http://www.projectclearwater.org/clearwater-performance- > and-our-load-monitor/. > > > > One final thought I had was that the node you’re running stress on might > be overloaded. If the stress node is not responding to messages in a timely > fashion then that will generate time outs and unexpected messages. > > > > Thanks, > Graeme > > > > *From:* Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] > *On Behalf Of *??? ?ats? > *Sent:* 16 September 2016 15:16 > *To:* clearwater@lists.projectclearwater.org > *Subject:* Re: [Project Clearwater] Performance limit measurement > > > > Hi Graeme, > > > > thanks a lot for your response. > > > > In our scenario we are using the Stress node to generate 15000 calls in 60 > seconds. The number of > > unsuccessful calls varies from ~500 to ~5000 even in subsequent > repetitions of the same scenario. > > According to wireshark the failures happen because of Sprout that does not > send the correct responses in time > > and so we get "time-outs" and "unexpected messages" in the Stress node. > > The Sprout node has sufficient CPU and memory resources. > > What could be the reason of this instability in our deployment? > > > > Thank you in advance, > > Michael Katsoulis > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-09-16 16:14 GMT+03:00 Graeme Robertson (projectclearwater.org) < > gra...@projectclearwater.org>: > > Hi Michael, > > > > How many successes and failures are you seeing? We primarily use the > clearwater-sip-stress package to check we haven’t introduced crashes under > load, and to check we haven’t significantly regressed the performance of > Project Clearwater. Unfortunately clearwater-sip-stress is not reliable > enough to generate completely accurate performance numbers for Project > Clearwater (and we don’t accurately measure Project Clearwater performance > or provide numbers). We tend to see around 1% failures when running > clearwater-sip-stress. If your failure numbers are fluctuating at around 1% > then this is probably down to the test scripts not being completely > reliable, and you won’t have actually hit the deployment’s limit until you > start seeing more failures than this. > > > > Thanks, > > Graeme > > > > > > *From:* Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] > *On Behalf Of *??? ?ats? > *Sent:* 16 September 2016 10:17 > *To:* Clearwater@lists.projectclearwater.org > *Subject:* [Project C
Re: [Project Clearwater] Performance limit measurement
Hi Michael, Can you tell me more about your scenario? It sounds like you’re not using the clearwater-sip-stress package, or at least not in exactly the form we package up. If you’re not using the clearwater-sip-stress package then please can you send details of your stress scenario? Depending on how powerful your Sprout node is, I would expect 15000 calls per second to be towards the upper limit of its performance powers. However, if the CPU is not particularly high then that would suggest that Sprout’s throttling controls might require further tuning. Do you know what return code the “unexpected messages” have? 503s indicate that there is overload somewhere. Sprout does adjust its throttling controls to match the load its able to process, but that process is not immediate, and we recommend building stress up gradually rather than immediately firing 15000 calls per second into the system – for more information on that, see http://www.projectclearwater.org/clearwater-performance-and-our-load-monitor/. One final thought I had was that the node you’re running stress on might be overloaded. If the stress node is not responding to messages in a timely fashion then that will generate time outs and unexpected messages. Thanks, Graeme From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] On Behalf Of ??? ?ats? Sent: 16 September 2016 15:16 To: clearwater@lists.projectclearwater.org Subject: Re: [Project Clearwater] Performance limit measurement Hi Graeme, thanks a lot for your response. In our scenario we are using the Stress node to generate 15000 calls in 60 seconds. The number of unsuccessful calls varies from ~500 to ~5000 even in subsequent repetitions of the same scenario. According to wireshark the failures happen because of Sprout that does not send the correct responses in time and so we get "time-outs" and "unexpected messages" in the Stress node. The Sprout node has sufficient CPU and memory resources. What could be the reason of this instability in our deployment? Thank you in advance, Michael Katsoulis 2016-09-16 16:14 GMT+03:00 Graeme Robertson (projectclearwater.org<http://projectclearwater.org>) mailto:gra...@projectclearwater.org>>: Hi Michael, How many successes and failures are you seeing? We primarily use the clearwater-sip-stress package to check we haven’t introduced crashes under load, and to check we haven’t significantly regressed the performance of Project Clearwater. Unfortunately clearwater-sip-stress is not reliable enough to generate completely accurate performance numbers for Project Clearwater (and we don’t accurately measure Project Clearwater performance or provide numbers). We tend to see around 1% failures when running clearwater-sip-stress. If your failure numbers are fluctuating at around 1% then this is probably down to the test scripts not being completely reliable, and you won’t have actually hit the deployment’s limit until you start seeing more failures than this. Thanks, Graeme From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org<mailto:clearwater-boun...@lists.projectclearwater.org>] On Behalf Of ??? ?ats? Sent: 16 September 2016 10:17 To: Clearwater@lists.projectclearwater.org<mailto:Clearwater@lists.projectclearwater.org> Subject: [Project Clearwater] Performance limit measurement Hi all, we are running Stress Tests against our Clearwater Deployment using Sip Stress node. We have noticed that the results are not consistent as the number of successfull calls changes during repetitions of the same test scenario. We have tried to increase the values of max_tokens , init_token_rate, min_token_rate and target_latency_us but we did not observe any difference. What is the proposed way to discover the deployment's limit on how many requests per second can be served? Thanks in advance, Michael Katsoulis ___ Clearwater mailing list Clearwater@lists.projectclearwater.org<mailto:Clearwater@lists.projectclearwater.org> http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org ___ Clearwater mailing list Clearwater@lists.projectclearwater.org http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
Re: [Project Clearwater] Performance limit measurement
Hi Graeme, thanks a lot for your response. In our scenario we are using the Stress node to generate 15000 calls in 60 seconds. The number of unsuccessful calls varies from ~500 to ~5000 even in subsequent repetitions of the same scenario. According to wireshark the failures happen because of Sprout that does not send the correct responses in time and so we get "time-outs" and "unexpected messages" in the Stress node. The Sprout node has sufficient CPU and memory resources. What could be the reason of this instability in our deployment? Thank you in advance, Michael Katsoulis 2016-09-16 16:14 GMT+03:00 Graeme Robertson (projectclearwater.org) < gra...@projectclearwater.org>: > Hi Michael, > > > > How many successes and failures are you seeing? We primarily use the > clearwater-sip-stress package to check we haven’t introduced crashes under > load, and to check we haven’t significantly regressed the performance of > Project Clearwater. Unfortunately clearwater-sip-stress is not reliable > enough to generate completely accurate performance numbers for Project > Clearwater (and we don’t accurately measure Project Clearwater performance > or provide numbers). We tend to see around 1% failures when running > clearwater-sip-stress. If your failure numbers are fluctuating at around 1% > then this is probably down to the test scripts not being completely > reliable, and you won’t have actually hit the deployment’s limit until you > start seeing more failures than this. > > > > Thanks, > > Graeme > > > > > > *From:* Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] > *On Behalf Of *??? ?ats? > *Sent:* 16 September 2016 10:17 > *To:* Clearwater@lists.projectclearwater.org > *Subject:* [Project Clearwater] Performance limit measurement > > > > Hi all, > > > > we are running Stress Tests against our Clearwater Deployment using Sip > Stress node. > > We have noticed that the results are not consistent as the number of > successfull calls changes during repetitions of the same test scenario. > > > > We have tried to increase the values of max_tokens , init_token_rate, > min_token_rate and > > target_latency_us but we did not observe any difference. > > > > What is the proposed way to discover the deployment's limit on how many > requests per second can > > be served? > > > > Thanks in advance, > > Michael Katsoulis > > ___ > Clearwater mailing list > Clearwater@lists.projectclearwater.org > http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists. > projectclearwater.org > > ___ Clearwater mailing list Clearwater@lists.projectclearwater.org http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
Re: [Project Clearwater] Performance limit measurement
Hi Michael, How many successes and failures are you seeing? We primarily use the clearwater-sip-stress package to check we haven’t introduced crashes under load, and to check we haven’t significantly regressed the performance of Project Clearwater. Unfortunately clearwater-sip-stress is not reliable enough to generate completely accurate performance numbers for Project Clearwater (and we don’t accurately measure Project Clearwater performance or provide numbers). We tend to see around 1% failures when running clearwater-sip-stress. If your failure numbers are fluctuating at around 1% then this is probably down to the test scripts not being completely reliable, and you won’t have actually hit the deployment’s limit until you start seeing more failures than this. Thanks, Graeme From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] On Behalf Of ??? ?ats? Sent: 16 September 2016 10:17 To: Clearwater@lists.projectclearwater.org Subject: [Project Clearwater] Performance limit measurement Hi all, we are running Stress Tests against our Clearwater Deployment using Sip Stress node. We have noticed that the results are not consistent as the number of successfull calls changes during repetitions of the same test scenario. We have tried to increase the values of max_tokens , init_token_rate, min_token_rate and target_latency_us but we did not observe any difference. What is the proposed way to discover the deployment's limit on how many requests per second can be served? Thanks in advance, Michael Katsoulis ___ Clearwater mailing list Clearwater@lists.projectclearwater.org http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
[Project Clearwater] Performance limit measurement
Hi all, we are running Stress Tests against our Clearwater Deployment using Sip Stress node. We have noticed that the results are not consistent as the number of successfull calls changes during repetitions of the same test scenario. We have tried to increase the values of max_tokens , init_token_rate, min_token_rate and target_latency_us but we did not observe any difference. What is the proposed way to discover the deployment's limit on how many requests per second can be served? Thanks in advance, Michael Katsoulis ___ Clearwater mailing list Clearwater@lists.projectclearwater.org http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org