Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
Great question. So, some use cases, like guest agent, would like to see something around ~20ms if the agent is needing to respond to requests from a control surface/panel while a user clicks around. I spoke with a social media company who was also interested in low latency just because they have a big volume of messages they need to slog through in a timely manner or they will get behind (long-polling or websocket support was something they would like to see). Other use cases should be fine with, say, 100ms. I want to say Heat’s needs probably fall into that latter category, but I’m only speculating. Some other feedback we got a while back was that people would like a knob to tweak queue attributes. E.g., the tradeoff between durability and performance. That led to work on queue “flavors”, which Flavio has been working on this past cycle, so I’ll let him chime in on that. From: Joe Gordon mailto:joe.gord...@gmail.com>> Reply-To: OpenStack Dev mailto:openstack-dev@lists.openstack.org>> Date: Wednesday, September 17, 2014 at 2:32 PM To: OpenStack Dev mailto:openstack-dev@lists.openstack.org>> Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2) Can you further quantify what you would consider too slow, is it 100ms too slow. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On Tue, Sep 16, 2014 at 8:02 AM, Kurt Griffiths < kurt.griffi...@rackspace.com> wrote: > Right, graphing those sorts of variables has always been part of our > test plan. What I’ve done so far was just some pilot tests, and I realize > now that I wasn’t very clear on that point. I wanted to get a rough idea of > where the Redis driver sat in case there were any obvious bug fixes that > needed to be taken care of before performing more extensive testing. As it > turns out, I did find one bug that has since been fixed. > > Regarding latency, saying that it "is not important” is an exaggeration; > it is definitely important, just not the* only *thing that is important. > I have spoken with a lot of prospective Zaqar users since the inception of > the project, and one of the common threads was that latency needed to be > reasonable. For the use cases where they see Zaqar delivering a lot of > value, requests don't need to be as fast as, say, ZMQ, but they do need > something that isn’t horribly *slow,* either. They also want HTTP, > multi-tenant, auth, durability, etc. The goal is to find a reasonable > amount of latency given our constraints and also, obviously, be able to > deliver all that at scale. > Can you further quantify what you would consider too slow, is it 100ms too slow. > > In any case, I’ve continue working through the test plan and will be > publishing further test results shortly. > > > graph latency versus number of concurrent active tenants > > By tenants do you mean in the sense of OpenStack Tenants/Project-ID's or > in the sense of “clients/workers”? For the latter case, the pilot tests > I’ve done so far used multiple clients (though not graphed), but in the > former case only one “project” was used. > multiple Tenant/Project-IDs > > From: Joe Gordon > Reply-To: OpenStack Dev > Date: Friday, September 12, 2014 at 1:45 PM > To: OpenStack Dev > Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2) > > If zaqar is like amazon SQS, then the latency for a single message and > the throughput for a single tenant is not important. I wouldn't expect > anyone who has latency sensitive work loads or needs massive throughput to > use zaqar, as these people wouldn't use SQS either. The consistency of the > latency (shouldn't change under load) and zaqar's ability to scale > horizontally mater much more. What I would be great to see some other > things benchmarked instead: > > * graph latency versus number of concurrent active tenants > * graph latency versus message size > * How throughput scales as you scale up the number of assorted zaqar > components. If one of the benefits of zaqar is its horizontal scalability, > lets see it. > * How does this change with message batching? > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
Right, graphing those sorts of variables has always been part of our test plan. What I’ve done so far was just some pilot tests, and I realize now that I wasn’t very clear on that point. I wanted to get a rough idea of where the Redis driver sat in case there were any obvious bug fixes that needed to be taken care of before performing more extensive testing. As it turns out, I did find one bug that has since been fixed. Regarding latency, saying that it "is not important” is an exaggeration; it is definitely important, just not the only thing that is important. I have spoken with a lot of prospective Zaqar users since the inception of the project, and one of the common threads was that latency needed to be reasonable. For the use cases where they see Zaqar delivering a lot of value, requests don't need to be as fast as, say, ZMQ, but they do need something that isn’t horribly slow, either. They also want HTTP, multi-tenant, auth, durability, etc. The goal is to find a reasonable amount of latency given our constraints and also, obviously, be able to deliver all that at scale. In any case, I’ve continue working through the test plan and will be publishing further test results shortly. > graph latency versus number of concurrent active tenants By tenants do you mean in the sense of OpenStack Tenants/Project-ID's or in the sense of “clients/workers”? For the latter case, the pilot tests I’ve done so far used multiple clients (though not graphed), but in the former case only one “project” was used. From: Joe Gordon mailto:joe.gord...@gmail.com>> Reply-To: OpenStack Dev mailto:openstack-dev@lists.openstack.org>> Date: Friday, September 12, 2014 at 1:45 PM To: OpenStack Dev mailto:openstack-dev@lists.openstack.org>> Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2) If zaqar is like amazon SQS, then the latency for a single message and the throughput for a single tenant is not important. I wouldn't expect anyone who has latency sensitive work loads or needs massive throughput to use zaqar, as these people wouldn't use SQS either. The consistency of the latency (shouldn't change under load) and zaqar's ability to scale horizontally mater much more. What I would be great to see some other things benchmarked instead: * graph latency versus number of concurrent active tenants * graph latency versus message size * How throughput scales as you scale up the number of assorted zaqar components. If one of the benefits of zaqar is its horizontal scalability, lets see it. * How does this change with message batching? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths < kurt.griffi...@rackspace.com> wrote: > Hi folks, > > In this second round of performance testing, I benchmarked the new Redis > driver. I used the same setup and tests as in Round 1 to make it easier to > compare the two drivers. I did not test Redis in master-slave mode, but > that likely would not make a significant difference in the results since > Redis replication is asynchronous[1]. > > As always, the usual benchmarking disclaimers apply (i.e., take these > numbers with a grain of salt; they are only intended to provide a ballpark > reference; you should perform your own tests, simulating your specific > scenarios and using your own hardware; etc.). > > ## Setup ## > > Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to > mitigate noisy neighbor when running the performance tests: > > * 1x Load Generator > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar-bench > * 1x Web Head > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar server > * storage=mongodb > * partitions=4 > * MongoDB URI configured with w=majority > * uWSGI + gevent > * config: http://paste.openstack.org/show/100592/ > * app.py: http://paste.openstack.org/show/100593/ > * 3x MongoDB Nodes > * Hardware > * 2x Intel Xeon E5-2680 v2 2.8Ghz > * 128 GB RAM > * 10Gbps NIC > * 2x LSI Nytro WarpDrive BLP4-1600[2] > * Software > * Debian Wheezy > * mongod 2.6.4 > * Default config, except setting replSet and enabling periodic > logging of CPU and I/O > * Journaling enabled > * Profiling on message DBs enabled for requests over 10ms > * 1x Redis Node > * Hardware > * 2x Intel Xeon E5-2680 v2 2.8Ghz > * 128 GB RAM > * 10Gbps NIC > * 2x LSI Nytro WarpDrive BLP4-1600[2] > * Software > * Debian Wheezy > * Redis 2.4.14 > * Default config (snapshotting and AOF enabled) > * One process > > As in Round 1, Keystone auth is disabled and requests go over HTTP, not > HTTPS. The latency introduced by enabling these is outside the control of > Zaqar, but should be quite minimal (speaking anecdotally, I would expect > an additional 1-3ms for cached tokens and assuming an optimized TLS > termination setup). > > For generating the load, I again used the zaqar-bench tool. I would like > to see the team complete a large-scale Tsung test as well (including a > full HA deployment with Keystone and HTTPS enabled), but decided not to > wait for that before publishing the results for the Redis driver using > zaqar-bench. > > CPU usage on the Redis node peaked at around 75% for the one process. To > better utilize the hardware, a production deployment would need to run > multiple Redis processes and use Zaqar's backend pooling feature to > distribute queues across the various instances. > > Several different messaging patterns were tested, taking inspiration > from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar) > > Each test was executed three times and the best time recorded. > > A ~1K sample message (1398 bytes) was used for all tests. > > ## Results ## > > ### Event Broadcasting (Read-Heavy) ### > > OK, so let's say you have a somewhat low-volume source, but tons of event > observers. In this case, the observers easily outpace the producer, making > this a read-heavy workload. > > Options > * 1 producer process with 5 gevent workers > * 1 message posted per request > * 2 observer processes with 25 gevent workers each > * 5 messages listed per request by the observers > * Load distributed across 4[6] queues > * 10-second duration > 10 seconds is way too short > > Results > * Redis > * Producer: 1.7 ms/req, 585 req/sec > * Observer: 1.5 ms/req, 1254 req/sec > * Mongo > * Producer: 2.2 ms/req, 454 req/sec > * Observer: 1.5 ms/req, 1224 req/sec If zaqar is like amazon SQS, then the latency for a single message and the throughput for a single tenant is not important. I wouldn't expect anyone who has latency sensitive work loads or needs massive throughput to use zaqar, as these people wouldn't use SQS either. The consistency of the latency (shouldn't change under load) and zaqar's ability to scale horizontally mater much more. What I would be great to see some other things benchmarked instead: * graph latency versus number of concurrent active tenants * graph latency versus message size * How throughput scales as you scale up the number of assorted za
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On 09/12/2014 01:36 AM, Boris Pavlovic wrote: > Kurt, > > Speaking generally, I’d like to see the project bake this in over > time as > part of the CI process. It’s definitely useful information not just for > the developers but also for operators in terms of capacity planning. > We’ve > > talked as a team about doing this with Rally (and in fact, some > work has > > been started there), but it may be useful to also run a large-scale > test > > on a regular basis (at least per milestone). > > > I believe, we will be able to generate distributed load and generate at > least > 20k rps in K cycle. We've done a lot of work during J in this direction, > but there is still a lot of to do. > > So you'll be able to use the same tool for gates, local usage and > large-scale tests. Lets talk about it :) Would it be possible to get an update from you at the summit (or mailing list)? I'm interested to know where you guys are with this, what is missing and most importantly, how we can help. Thanks Boris, Flavio -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
Kurt, Speaking generally, I’d like to see the project bake this in over time as > part of the CI process. It’s definitely useful information not just for > the developers but also for operators in terms of capacity planning. We’ve > talked as a team about doing this with Rally (and in fact, some work has been started there), but it may be useful to also run a large-scale test on a regular basis (at least per milestone). I believe, we will be able to generate distributed load and generate at least 20k rps in K cycle. We've done a lot of work during J in this direction, but there is still a lot of to do. So you'll be able to use the same tool for gates, local usage and large-scale tests. Best regards, Boris Pavlovic On Fri, Sep 12, 2014 at 3:17 AM, Kurt Griffiths < kurt.griffi...@rackspace.com> wrote: > On 9/11/14, 2:11 PM, "Devananda van der Veen" > wrote: > > >OK - those resource usages sound better. At least you generated enough > >load to saturate the uWSGI process CPU, which is a good point to look > >at performance of the system. > > > >At that peak, what was the: > >- average msgs/sec > >- min/max/avg/stdev time to [post|get|delete] a message > > To be honest, it was a quick test and I didn’t note the exact metrics > other than eyeballing them to see that they were similar to the results > that I published for the scenarios that used the same load options (e.g., > I just re-ran some of the same test scenarios). > > Some of the metrics you mention aren’t currently reported by zaqar-bench, > but could be added easily enough. In any case, I think zaqar-bench is > going to end up being mostly useful to track relative performance gains or > losses on a patch-by-patch basis, and also as an easy way to smoke-test > both python-marconiclient and the service. For large-scale testing and > detailed metrics, other tools (e.g., Tsung, JMeter) are better for the > job, so I’ve been considering using them in future rounds. > > >Is that 2,181 msg/sec total, or per-producer? > > That metric was a combined average rate for all producers. > > > > >I'd really like to see the total throughput and latency graphed as # > >of clients increases. Or if graphing isn't your thing, even just post > >a .csv of the raw numbers and I will be happy to graph it. > > > >It would also be great to see how that scales as you add more Redis > >instances until all the available CPU cores on your Redis host are in > >Use. > > Yep, I’ve got a long list of things like this that I’d like to see in > future rounds of performance testing (and I welcome anyone in the > community with an interest to join in), but I have to balance that effort > with a lot of other things that are on my plate right now. > > Speaking generally, I’d like to see the project bake this in over time as > part of the CI process. It’s definitely useful information not just for > the developers but also for operators in terms of capacity planning. We’ve > talked as a team about doing this with Rally (and in fact, some work has > been started there), but it may be useful to also run a large-scale test > on a regular basis (at least per milestone). Regardless, I think it would > be great for the Zaqar team to connect with other projects (at the > summit?) who are working on perf testing to swap ideas, collaborate on > code/tools, etc. > > --KG > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On 9/11/14, 2:11 PM, "Devananda van der Veen" wrote: >OK - those resource usages sound better. At least you generated enough >load to saturate the uWSGI process CPU, which is a good point to look >at performance of the system. > >At that peak, what was the: >- average msgs/sec >- min/max/avg/stdev time to [post|get|delete] a message To be honest, it was a quick test and I didn’t note the exact metrics other than eyeballing them to see that they were similar to the results that I published for the scenarios that used the same load options (e.g., I just re-ran some of the same test scenarios). Some of the metrics you mention aren’t currently reported by zaqar-bench, but could be added easily enough. In any case, I think zaqar-bench is going to end up being mostly useful to track relative performance gains or losses on a patch-by-patch basis, and also as an easy way to smoke-test both python-marconiclient and the service. For large-scale testing and detailed metrics, other tools (e.g., Tsung, JMeter) are better for the job, so I’ve been considering using them in future rounds. >Is that 2,181 msg/sec total, or per-producer? That metric was a combined average rate for all producers. > >I'd really like to see the total throughput and latency graphed as # >of clients increases. Or if graphing isn't your thing, even just post >a .csv of the raw numbers and I will be happy to graph it. > >It would also be great to see how that scales as you add more Redis >instances until all the available CPU cores on your Redis host are in >Use. Yep, I’ve got a long list of things like this that I’d like to see in future rounds of performance testing (and I welcome anyone in the community with an interest to join in), but I have to balance that effort with a lot of other things that are on my plate right now. Speaking generally, I’d like to see the project bake this in over time as part of the CI process. It’s definitely useful information not just for the developers but also for operators in terms of capacity planning. We’ve talked as a team about doing this with Rally (and in fact, some work has been started there), but it may be useful to also run a large-scale test on a regular basis (at least per milestone). Regardless, I think it would be great for the Zaqar team to connect with other projects (at the summit?) who are working on perf testing to swap ideas, collaborate on code/tools, etc. --KG ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On Wed, Sep 10, 2014 at 6:09 PM, Kurt Griffiths wrote: > On 9/10/14, 3:58 PM, "Devananda van der Veen" > wrote: > >>I'm going to assume that, for these benchmarks, you configured all the >>services optimally. > > Sorry for any confusion; I am not trying to hide anything about the setup. > I thought I was pretty transparent about the way uWSGI, MongoDB, and Redis > were configured. I tried to stick to mostly default settings to keep > things simple, making it easier for others to reproduce/verify the results. > > Is there further information about the setup that you were curious about > that I could provide? Was there a particular optimization that you didn’t > see that you would recommend? > Nope. >>I'm not going to question why you didn't run tests >>with tens or hundreds of concurrent clients, > > If you review the different tests, you will note that a couple of them > used at least 100 workers. That being said, I think we ought to try higher > loads in future rounds of testing. > Perhaps I misunderstand what "2 processes with 25 gevent workers" means - I think this means you have two _processes_ which are using greenthreads and eventlet, and so each of those two python processes is swapping between 25 coroutines. From a load generation standpoint, this is not the same as having 100 concurrent client _processes_. >>or why you only ran the >>tests for 10 seconds. > > In Round 1 I did mention that i wanted to do a followup with a longer > duration. However, as I alluded to in the preamble for Round 2, I kept > things the same for the redis tests to compare with the mongo ones done > previously. > > We’ll increase the duration in the next round of testing. > Sure - consistency between tests is good. But I don't believe that a 10-second benchmark is ever enough to suss out service performance. Lots of things only appear after high load has been applied for a period of time as eg. caches fill up, though this leads to my next point below... >>Instead, I'm actually going to question how it is that, even with >>relatively beefy dedicated hardware (128 GB RAM in your storage >>nodes), Zaqar peaked at around 1,200 messages per second. > > I went back and ran some of the tests and never saw memory go over ~20M > (as observed with redis-top) so these same results should be obtainable on > a box with a lot less RAM. Whoa. So, that's a *really* important piece of information which was, afaict, missing from your previous email(s). I hope you can understand how, with the information you provided ("the Redis server has 128GB RAM") I was shocked at the low performance. > Furthermore, the tests only used 1 CPU on the > Redis host, so again, similar results should be achievable on a much more > modest box. You described fairy beefy hardware but didn't utilize it fully -- I was expecting your performance test to attempt to stress the various components of a Zaqar installation and, at least in some way, attempt to demonstrate what the capacity of a Zaqar deployment might be on the hardware you have available. Thus my surprise at the low numbers. If that wasn't your intent (and given the CPU/RAM usage your tests achieved, it's not what you achieved) then my disappointment in those performance numbers is unfounded. But I hope you can understand, if I'm looking at a service benchmark to gauge how well that service might perform in production, seeing expensive hardware perform disappointingly slowly is not a good sign. > > FWIW, I went back and ran a couple scenarios to get some more data points. > First, I did one with 50 producers and 50 observers. In that case, the > single CPU on which the OS scheduled the Redis process peaked at 30%. The > second test I did was with 50 producers + 5 observers + 50 consumers > (which claim messages and delete them rather than simply page through > them). This time, Redis used 78% of its CPU. I suppose this should not be > surprising because the consumers do a lot more work than the observers. > Meanwhile, load on the web head was fairly high; around 80% for all 20 > CPUs. This tells me that python and/or uWSGI are working pretty hard to > serve these requests, and there may be some opportunities to optimize that > layer. I suspect there are also some opportunities to reduce the number of > Redis operations and roundtrips required to claim a batch of messages. > OK - those resource usages sound better. At least you generated enough load to saturate the uWSGI process CPU, which is a good point to look at performance of the system. At that peak, what was the: - average msgs/sec - min/max/avg/stdev time to [post|get|delete] a message > The other thing to consider is that in these first two rounds I did not > test increasing amounts of load (number of clients performing concurrent > requests) and graph that against latency and throughput. Out of curiosity, > I just now did a quick test to compare the messages enqueued with 50 > producers + 5 observers + 50 consumers vs. adding anot
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On 9/10/14, 3:58 PM, "Devananda van der Veen" wrote: >I'm going to assume that, for these benchmarks, you configured all the >services optimally. Sorry for any confusion; I am not trying to hide anything about the setup. I thought I was pretty transparent about the way uWSGI, MongoDB, and Redis were configured. I tried to stick to mostly default settings to keep things simple, making it easier for others to reproduce/verify the results. Is there further information about the setup that you were curious about that I could provide? Was there a particular optimization that you didn’t see that you would recommend? >I'm not going to question why you didn't run tests >with tens or hundreds of concurrent clients, If you review the different tests, you will note that a couple of them used at least 100 workers. That being said, I think we ought to try higher loads in future rounds of testing. >or why you only ran the >tests for 10 seconds. In Round 1 I did mention that i wanted to do a followup with a longer duration. However, as I alluded to in the preamble for Round 2, I kept things the same for the redis tests to compare with the mongo ones done previously. We’ll increase the duration in the next round of testing. >Instead, I'm actually going to question how it is that, even with >relatively beefy dedicated hardware (128 GB RAM in your storage >nodes), Zaqar peaked at around 1,200 messages per second. I went back and ran some of the tests and never saw memory go over ~20M (as observed with redis-top) so these same results should be obtainable on a box with a lot less RAM. Furthermore, the tests only used 1 CPU on the Redis host, so again, similar results should be achievable on a much more modest box. FWIW, I went back and ran a couple scenarios to get some more data points. First, I did one with 50 producers and 50 observers. In that case, the single CPU on which the OS scheduled the Redis process peaked at 30%. The second test I did was with 50 producers + 5 observers + 50 consumers (which claim messages and delete them rather than simply page through them). This time, Redis used 78% of its CPU. I suppose this should not be surprising because the consumers do a lot more work than the observers. Meanwhile, load on the web head was fairly high; around 80% for all 20 CPUs. This tells me that python and/or uWSGI are working pretty hard to serve these requests, and there may be some opportunities to optimize that layer. I suspect there are also some opportunities to reduce the number of Redis operations and roundtrips required to claim a batch of messages. The other thing to consider is that in these first two rounds I did not test increasing amounts of load (number of clients performing concurrent requests) and graph that against latency and throughput. Out of curiosity, I just now did a quick test to compare the messages enqueued with 50 producers + 5 observers + 50 consumers vs. adding another 50 producer clients and found that the producers were able to post 2,181 messages per second while giving up only 0.3 ms. --KG ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths wrote: > Hi folks, > > In this second round of performance testing, I benchmarked the new Redis > driver. I used the same setup and tests as in Round 1 to make it easier to > compare the two drivers. I did not test Redis in master-slave mode, but > that likely would not make a significant difference in the results since > Redis replication is asynchronous[1]. > > As always, the usual benchmarking disclaimers apply (i.e., take these > numbers with a grain of salt; they are only intended to provide a ballpark > reference; you should perform your own tests, simulating your specific > scenarios and using your own hardware; etc.). > > ## Setup ## > > Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to > mitigate noisy neighbor when running the performance tests: > > * 1x Load Generator > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar-bench > * 1x Web Head > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar server > * storage=mongodb > * partitions=4 > * MongoDB URI configured with w=majority > * uWSGI + gevent > * config: http://paste.openstack.org/show/100592/ > * app.py: http://paste.openstack.org/show/100593/ > * 3x MongoDB Nodes > * Hardware > * 2x Intel Xeon E5-2680 v2 2.8Ghz > * 128 GB RAM > * 10Gbps NIC > * 2x LSI Nytro WarpDrive BLP4-1600[2] > * Software > * Debian Wheezy > * mongod 2.6.4 > * Default config, except setting replSet and enabling periodic > logging of CPU and I/O > * Journaling enabled > * Profiling on message DBs enabled for requests over 10ms > * 1x Redis Node > * Hardware > * 2x Intel Xeon E5-2680 v2 2.8Ghz > * 128 GB RAM > * 10Gbps NIC > * 2x LSI Nytro WarpDrive BLP4-1600[2] > * Software > * Debian Wheezy > * Redis 2.4.14 > * Default config (snapshotting and AOF enabled) > * One process > > As in Round 1, Keystone auth is disabled and requests go over HTTP, not > HTTPS. The latency introduced by enabling these is outside the control of > Zaqar, but should be quite minimal (speaking anecdotally, I would expect > an additional 1-3ms for cached tokens and assuming an optimized TLS > termination setup). > > For generating the load, I again used the zaqar-bench tool. I would like > to see the team complete a large-scale Tsung test as well (including a > full HA deployment with Keystone and HTTPS enabled), but decided not to > wait for that before publishing the results for the Redis driver using > zaqar-bench. > > CPU usage on the Redis node peaked at around 75% for the one process. To > better utilize the hardware, a production deployment would need to run > multiple Redis processes and use Zaqar's backend pooling feature to > distribute queues across the various instances. > > Several different messaging patterns were tested, taking inspiration > from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar) > > Each test was executed three times and the best time recorded. > > A ~1K sample message (1398 bytes) was used for all tests. > > ## Results ## > > ### Event Broadcasting (Read-Heavy) ### > > OK, so let's say you have a somewhat low-volume source, but tons of event > observers. In this case, the observers easily outpace the producer, making > this a read-heavy workload. > > Options > * 1 producer process with 5 gevent workers > * 1 message posted per request > * 2 observer processes with 25 gevent workers each > * 5 messages listed per request by the observers > * Load distributed across 4[6] queues > * 10-second duration > > Results > * Redis > * Producer: 1.7 ms/req, 585 req/sec > * Observer: 1.5 ms/req, 1254 req/sec > * Mongo > * Producer: 2.2 ms/req, 454 req/sec > * Observer: 1.5 ms/req, 1224 req/sec > > ### Event Broadcasting (Balanced) ### > > This test uses the same number of producers and consumers, but note that > the observers are still listing (up to) 5 messages at a time[4], so they > still outpace the producers, but not as quickly as before. > > Options > * 2 producer processes with 25 gevent workers each > * 1 message posted per request > * 2 observer processes with 25 gevent workers each > * 5 messages listed per request by the observers > * Load distributed across 4 queues > * 10-second duration > > Results > * Redis > * Producer: 1.4 ms/req, 1374 req/sec > * Observer: 1.6 ms/req, 1178 req/sec > * Mongo >
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
Thanks! Looks good. Only thing I noticed was that footnotes were still referenced, but did not appear at the bottom of the page. On 9/10/14, 6:16 AM, "Flavio Percoco" wrote: >I've collected the information from both performance tests and put it in >the project's wiki[0] Please, double check :D ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On 09/09/2014 09:19 PM, Kurt Griffiths wrote: > Hi folks, > > In this second round of performance testing, I benchmarked the new Redis > driver. I used the same setup and tests as in Round 1 to make it easier to > compare the two drivers. I did not test Redis in master-slave mode, but > that likely would not make a significant difference in the results since > Redis replication is asynchronous[1]. > > As always, the usual benchmarking disclaimers apply (i.e., take these > numbers with a grain of salt; they are only intended to provide a ballpark > reference; you should perform your own tests, simulating your specific > scenarios and using your own hardware; etc.). > > ## Setup ## > > Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to > mitigate noisy neighbor when running the performance tests: > > * 1x Load Generator > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar-bench > * 1x Web Head > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar server > * storage=mongodb > * partitions=4 > * MongoDB URI configured with w=majority > * uWSGI + gevent > * config: http://paste.openstack.org/show/100592/ > * app.py: http://paste.openstack.org/show/100593/ > * 3x MongoDB Nodes > * Hardware > * 2x Intel Xeon E5-2680 v2 2.8Ghz > * 128 GB RAM > * 10Gbps NIC > * 2x LSI Nytro WarpDrive BLP4-1600[2] > * Software > * Debian Wheezy > * mongod 2.6.4 > * Default config, except setting replSet and enabling periodic > logging of CPU and I/O > * Journaling enabled > * Profiling on message DBs enabled for requests over 10ms > * 1x Redis Node > * Hardware > * 2x Intel Xeon E5-2680 v2 2.8Ghz > * 128 GB RAM > * 10Gbps NIC > * 2x LSI Nytro WarpDrive BLP4-1600[2] > * Software > * Debian Wheezy > * Redis 2.4.14 > * Default config (snapshotting and AOF enabled) > * One process > > As in Round 1, Keystone auth is disabled and requests go over HTTP, not > HTTPS. The latency introduced by enabling these is outside the control of > Zaqar, but should be quite minimal (speaking anecdotally, I would expect > an additional 1-3ms for cached tokens and assuming an optimized TLS > termination setup). > > For generating the load, I again used the zaqar-bench tool. I would like > to see the team complete a large-scale Tsung test as well (including a > full HA deployment with Keystone and HTTPS enabled), but decided not to > wait for that before publishing the results for the Redis driver using > zaqar-bench. > > CPU usage on the Redis node peaked at around 75% for the one process. To > better utilize the hardware, a production deployment would need to run > multiple Redis processes and use Zaqar's backend pooling feature to > distribute queues across the various instances. > > Several different messaging patterns were tested, taking inspiration > from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar) > > Each test was executed three times and the best time recorded. > > A ~1K sample message (1398 bytes) was used for all tests. > > ## Results ## > > ### Event Broadcasting (Read-Heavy) ### > > OK, so let's say you have a somewhat low-volume source, but tons of event > observers. In this case, the observers easily outpace the producer, making > this a read-heavy workload. > > Options > * 1 producer process with 5 gevent workers > * 1 message posted per request > * 2 observer processes with 25 gevent workers each > * 5 messages listed per request by the observers > * Load distributed across 4[6] queues > * 10-second duration > > Results > * Redis > * Producer: 1.7 ms/req, 585 req/sec > * Observer: 1.5 ms/req, 1254 req/sec > * Mongo > * Producer: 2.2 ms/req, 454 req/sec > * Observer: 1.5 ms/req, 1224 req/sec > > ### Event Broadcasting (Balanced) ### > > This test uses the same number of producers and consumers, but note that > the observers are still listing (up to) 5 messages at a time[4], so they > still outpace the producers, but not as quickly as before. > > Options > * 2 producer processes with 25 gevent workers each > * 1 message posted per request > * 2 observer processes with 25 gevent workers each > * 5 messages listed per request by the observers > * Load distributed across 4 queues > * 10-second duration > > Results > * Redis > * Producer: 1.4 ms/req, 1374 req/sec > * Observer: 1.6 ms/req, 1178 req/sec >