Re: [openstack-dev] [zaqar] [marconi] Juno Performance Testing (Round 1)
Sure thing, I’ll add that to my list of things to try in “Round 2” (coming later this week). On 8/28/14, 9:05 AM, Jay Pipes jaypi...@gmail.com wrote: On 08/26/2014 05:41 PM, Kurt Griffiths wrote: * uWSGI + gevent * config: http://paste.openstack.org/show/100592/ * app.py: http://paste.openstack.org/show/100593/ Hi Kurt! Thanks for posting the benchmark configuration and results. Good stuff :) I'm curious about what effect removing http-keepalive from the uWSGI config would make. AIUI, for systems that need to support lots and lots of random reads/writes from lots of tenants, using keepalive sessions would cause congestion for incoming new connections, and may not be appropriate for such systems. Totally not a big deal; really, just curious if you'd run one or more of the benchmarks with keepalive turned off and what results you saw. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] [marconi] Juno Performance Testing (Round 1)
On 08/26/2014 05:41 PM, Kurt Griffiths wrote: * uWSGI + gevent * config: http://paste.openstack.org/show/100592/ * app.py: http://paste.openstack.org/show/100593/ Hi Kurt! Thanks for posting the benchmark configuration and results. Good stuff :) I'm curious about what effect removing http-keepalive from the uWSGI config would make. AIUI, for systems that need to support lots and lots of random reads/writes from lots of tenants, using keepalive sessions would cause congestion for incoming new connections, and may not be appropriate for such systems. Totally not a big deal; really, just curious if you'd run one or more of the benchmarks with keepalive turned off and what results you saw. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] [marconi] Juno Performance Testing (Round 1)
On 08/26/2014 11:41 PM, Kurt Griffiths wrote: Hi folks, I ran some rough benchmarks to get an idea of where Zaqar currently stands re latency and throughput for Juno. These results are by no means conclusive, but I wanted to publish what I had so far for the sake of discussion. Note that these tests do not include results for our new Redis driver, but I hope to make those available soon. As always, the usual disclaimers apply (i.e., benchmarks mostly amount to lies; these numbers are only intended to provide a ballpark reference; you should perform your own tests, simulating your specific scenarios and using your own hardware; etc.). ## Setup ## Rather than VMs, I provisioned some Rackspace OnMetal[8] servers to mitigate noisy neighbor when running the performance tests: * 1x Load Generator * Hardware * 1x Intel Xeon E5-2680 v2 2.8Ghz * 32 GB RAM * 10Gbps NIC * 32GB SATADOM * Software * Debian Wheezy * Python 2.7.3 * zaqar-bench from trunk with some extra patches[1] * 1x Web Head * Hardware * 1x Intel Xeon E5-2680 v2 2.8Ghz * 32 GB RAM * 10Gbps NIC * 32GB SATADOM * Software * Debian Wheezy * Python 2.7.3 * zaqar server from trunk @47e07cad * storage=mongodb * partitions=4 * MongoDB URI configured with w=majority * uWSGI + gevent * config: http://paste.openstack.org/show/100592/ * app.py: http://paste.openstack.org/show/100593/ * 3x MongoDB Nodes * Hardware * 2x Intel Xeon E5-2680 v2 2.8Ghz * 128 GB RAM * 10Gbps NIC * 2x LSI Nytro WarpDrive BLP4-1600[2] * Software * Debian Wheezy * mongod 2.6.4 * Default config, except setting replSet and enabling periodic logging of CPU and I/O * Journaling enabled * Profiling on message DBs enabled for requests over 10ms For generating the load, I used the zaqar-bench tool we created during Juno as a stepping stone toward integration with Rally. Although the tool is still fairly rough, I thought it good enough to provide some useful data[3]. The tool uses the python-zaqarclient library. Note that I didn’t push the servers particularly hard for these tests; web head CPUs averaged around 20%, while the mongod primary’s CPU usage peaked at around 10% with DB locking peaking at 5%. Several different messaging patterns were tested, taking inspiration from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar) Each test was executed three times and the best time recorded. A ~1K sample message (1398 bytes) was used for all tests. ## Results ## ### Event Broadcasting (Read-Heavy) ### OK, so let's say you have a somewhat low-volume source, but tons of event observers. In this case, the observers easily outpace the producer, making this a read-heavy workload. Options * 1 producer process with 5 gevent workers * 1 message posted per request * 2 observer processes with 25 gevent workers each * 5 messages listed per request by the observers * Load distributed across 4[7] queues * 10-second duration[4] Results * Producer: 2.2 ms/req, 454 req/sec * Observer: 1.5 ms/req, 1224 req/sec ### Event Broadcasting (Balanced) ### This test uses the same number of producers and consumers, but note that the observers are still listing (up to) 5 messages at a time[5], so they still outpace the producers, but not as quickly as before. Options * 2 producer processes with 10 gevent workers each * 1 message posted per request * 2 observer processes with 25 gevent workers each * 5 messages listed per request by the observers * Load distributed across 4 queues * 10-second duration Results * Producer: 2.2 ms/req, 883 req/sec * Observer: 2.8 ms/req, 348 req/sec ### Point-to-Point Messaging ### In this scenario I simulated one client sending messages directly to a different client. Only one queue is required in this case[6]. Note the higher latency. While running the test there were 1-2 message posts that skewed the average by taking much longer (~100ms) than the others to complete. Such outliers are probably present in the other tests as well, and further investigation is need to discover the root cause. Options * 1 producer process with 1 gevent worker * 1 message posted per request * 1 observer process with 1 gevent worker * 1 message listed per request * All load sent to a single queue * 10-second duration Results * Producer: 5.5 ms/req, 179 req/sec * Observer: 3.5 ms/req, 278 req/sec ### Task Distribution ### This test uses several producers and consumers in order to simulate distributing tasks to a worker pool. In contrast
[openstack-dev] [zaqar] [marconi] Juno Performance Testing (Round 1)
Hi folks, I ran some rough benchmarks to get an idea of where Zaqar currently stands re latency and throughput for Juno. These results are by no means conclusive, but I wanted to publish what I had so far for the sake of discussion. Note that these tests do not include results for our new Redis driver, but I hope to make those available soon. As always, the usual disclaimers apply (i.e., benchmarks mostly amount to lies; these numbers are only intended to provide a ballpark reference; you should perform your own tests, simulating your specific scenarios and using your own hardware; etc.). ## Setup ## Rather than VMs, I provisioned some Rackspace OnMetal[8] servers to mitigate noisy neighbor when running the performance tests: * 1x Load Generator * Hardware * 1x Intel Xeon E5-2680 v2 2.8Ghz * 32 GB RAM * 10Gbps NIC * 32GB SATADOM * Software * Debian Wheezy * Python 2.7.3 * zaqar-bench from trunk with some extra patches[1] * 1x Web Head * Hardware * 1x Intel Xeon E5-2680 v2 2.8Ghz * 32 GB RAM * 10Gbps NIC * 32GB SATADOM * Software * Debian Wheezy * Python 2.7.3 * zaqar server from trunk @47e07cad * storage=mongodb * partitions=4 * MongoDB URI configured with w=majority * uWSGI + gevent * config: http://paste.openstack.org/show/100592/ * app.py: http://paste.openstack.org/show/100593/ * 3x MongoDB Nodes * Hardware * 2x Intel Xeon E5-2680 v2 2.8Ghz * 128 GB RAM * 10Gbps NIC * 2x LSI Nytro WarpDrive BLP4-1600[2] * Software * Debian Wheezy * mongod 2.6.4 * Default config, except setting replSet and enabling periodic logging of CPU and I/O * Journaling enabled * Profiling on message DBs enabled for requests over 10ms For generating the load, I used the zaqar-bench tool we created during Juno as a stepping stone toward integration with Rally. Although the tool is still fairly rough, I thought it good enough to provide some useful data[3]. The tool uses the python-zaqarclient library. Note that I didn’t push the servers particularly hard for these tests; web head CPUs averaged around 20%, while the mongod primary’s CPU usage peaked at around 10% with DB locking peaking at 5%. Several different messaging patterns were tested, taking inspiration from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar) Each test was executed three times and the best time recorded. A ~1K sample message (1398 bytes) was used for all tests. ## Results ## ### Event Broadcasting (Read-Heavy) ### OK, so let's say you have a somewhat low-volume source, but tons of event observers. In this case, the observers easily outpace the producer, making this a read-heavy workload. Options * 1 producer process with 5 gevent workers * 1 message posted per request * 2 observer processes with 25 gevent workers each * 5 messages listed per request by the observers * Load distributed across 4[7] queues * 10-second duration[4] Results * Producer: 2.2 ms/req, 454 req/sec * Observer: 1.5 ms/req, 1224 req/sec ### Event Broadcasting (Balanced) ### This test uses the same number of producers and consumers, but note that the observers are still listing (up to) 5 messages at a time[5], so they still outpace the producers, but not as quickly as before. Options * 2 producer processes with 10 gevent workers each * 1 message posted per request * 2 observer processes with 25 gevent workers each * 5 messages listed per request by the observers * Load distributed across 4 queues * 10-second duration Results * Producer: 2.2 ms/req, 883 req/sec * Observer: 2.8 ms/req, 348 req/sec ### Point-to-Point Messaging ### In this scenario I simulated one client sending messages directly to a different client. Only one queue is required in this case[6]. Note the higher latency. While running the test there were 1-2 message posts that skewed the average by taking much longer (~100ms) than the others to complete. Such outliers are probably present in the other tests as well, and further investigation is need to discover the root cause. Options * 1 producer process with 1 gevent worker * 1 message posted per request * 1 observer process with 1 gevent worker * 1 message listed per request * All load sent to a single queue * 10-second duration Results * Producer: 5.5 ms/req, 179 req/sec * Observer: 3.5 ms/req, 278 req/sec ### Task Distribution ### This test uses several producers and consumers in order to simulate distributing tasks to a worker pool. In contrast to the observer worker type, consumers claim and delete messages in such a way that each message is processed once and only once. Options * 2 producer processes with 25 gevent
Re: [openstack-dev] [zaqar] [marconi] Juno Performance Testing (Round 1)
Correction: there were 25 workers per producer process, not 10. On 8/26/14, 4:41 PM, Kurt Griffiths kurt.griffi...@rackspace.com wrote: ### Event Broadcasting (Balanced) ### This test uses the same number of producers and consumers, but note that the observers are still listing (up to) 5 messages at a time[5], so they still outpace the producers, but not as quickly as before. Options * 2 producer processes with 10 gevent workers each * 1 message posted per request * 2 observer processes with 25 gevent workers each * 5 messages listed per request by the observers * Load distributed across 4 queues * 10-second duration ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev