subject:"Re\: \[openstack\-dev\] \[zaqar\] Juno Performance Testing \(Round 2\)"

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-17 Thread Kurt Griffiths

Great question. So, some use cases, like guest agent, would like to see 
something around ~20ms if the agent is needing to respond to requests from a 
control surface/panel while a user clicks around. I spoke with a social media 
company who was also interested in low latency just because they have a big 
volume of messages they need to slog through in a timely manner or they will 
get behind (long-polling or websocket support was something they would like to 
see).

Other use cases should be fine with, say, 100ms. I want to say Heat’s needs 
probably fall into that latter category, but I’m only speculating.

Some other feedback we got a while back was that people would like a knob to 
tweak queue attributes. E.g., the tradeoff between durability and performance. 
That led to work on queue “flavors”, which Flavio has been working on this past 
cycle, so I’ll let him chime in on that.

From: Joe Gordon mailto:joe.gord...@gmail.com>>
Reply-To: OpenStack Dev 
mailto:openstack-dev@lists.openstack.org>>
Date: Wednesday, September 17, 2014 at 2:32 PM
To: OpenStack Dev 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Can you further quantify what you would consider too slow, is it 100ms too slow.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-17 Thread Joe Gordon

On Tue, Sep 16, 2014 at 8:02 AM, Kurt Griffiths <
kurt.griffi...@rackspace.com> wrote:

>  Right, graphing those sorts of variables has always been part of our
> test plan. What I’ve done so far was just some pilot tests, and I realize
> now that I wasn’t very clear on that point. I wanted to get a rough idea of
> where the Redis driver sat in case there were any obvious bug fixes that
> needed to be taken care of before performing more extensive testing. As it
> turns out, I did find one bug that has since been fixed.
>
>  Regarding latency, saying that it "is not important” is an exaggeration;
> it is definitely important, just not the* only *thing that is important.
> I have spoken with a lot of prospective Zaqar users since the inception of
> the project, and one of the common threads was that latency needed to be
> reasonable. For the use cases where they see Zaqar delivering a lot of
> value, requests don't need to be as fast as, say, ZMQ, but they do need
> something that isn’t horribly *slow,* either. They also want HTTP,
> multi-tenant, auth, durability, etc. The goal is to find a reasonable
> amount of latency given our constraints and also, obviously, be able to
> deliver all that at scale.
>

Can you further quantify what you would consider too slow, is it 100ms too
slow.


>
>  In any case, I’ve continue working through the test plan and will be
> publishing further test results shortly.
>
>  > graph latency versus number of concurrent active tenants
>
>  By tenants do you mean in the sense of OpenStack Tenants/Project-ID's or
> in  the sense of “clients/workers”? For the latter case, the pilot tests
> I’ve done so far used multiple clients (though not graphed), but in the
> former case only one “project” was used.
>

multiple  Tenant/Project-IDs


>
>   From: Joe Gordon 
> Reply-To: OpenStack Dev 
> Date: Friday, September 12, 2014 at 1:45 PM
> To: OpenStack Dev 
> Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
>
>  If zaqar is like amazon SQS, then the latency for a single message and
> the throughput for a single tenant is not important. I wouldn't expect
> anyone who has latency sensitive work loads or needs massive throughput to
> use zaqar, as these people wouldn't use SQS either. The consistency of the
> latency (shouldn't change under load) and zaqar's ability to scale
> horizontally mater much more. What I would be great to see some other
> things benchmarked instead:
>
>  * graph latency versus number of concurrent active tenants
> * graph latency versus message size
> * How throughput scales as you scale up the number of assorted zaqar
> components. If one of the benefits of zaqar is its horizontal scalability,
> lets see it.
>  * How does this change with message batching?
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-16 Thread Kurt Griffiths

Right, graphing those sorts of variables has always been part of our test plan. 
What I’ve done so far was just some pilot tests, and I realize now that I 
wasn’t very clear on that point. I wanted to get a rough idea of where the 
Redis driver sat in case there were any obvious bug fixes that needed to be 
taken care of before performing more extensive testing. As it turns out, I did 
find one bug that has since been fixed.

Regarding latency, saying that it "is not important” is an exaggeration; it is 
definitely important, just not the only thing that is important. I have spoken 
with a lot of prospective Zaqar users since the inception of the project, and 
one of the common threads was that latency needed to be reasonable. For the use 
cases where they see Zaqar delivering a lot of value, requests don't need to be 
as fast as, say, ZMQ, but they do need something that isn’t horribly slow, 
either. They also want HTTP, multi-tenant, auth, durability, etc. The goal is 
to find a reasonable amount of latency given our constraints and also, 
obviously, be able to deliver all that at scale.

In any case, I’ve continue working through the test plan and will be publishing 
further test results shortly.

> graph latency versus number of concurrent active tenants

By tenants do you mean in the sense of OpenStack Tenants/Project-ID's or in  
the sense of “clients/workers”? For the latter case, the pilot tests I’ve done 
so far used multiple clients (though not graphed), but in the former case only 
one “project” was used.

From: Joe Gordon mailto:joe.gord...@gmail.com>>
Reply-To: OpenStack Dev 
mailto:openstack-dev@lists.openstack.org>>
Date: Friday, September 12, 2014 at 1:45 PM
To: OpenStack Dev 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

If zaqar is like amazon SQS, then the latency for a single message and the 
throughput for a single tenant is not important. I wouldn't expect anyone who 
has latency sensitive work loads or needs massive throughput to use zaqar, as 
these people wouldn't use SQS either. The consistency of the latency (shouldn't 
change under load) and zaqar's ability to scale horizontally mater much more. 
What I would be great to see some other things benchmarked instead:

* graph latency versus number of concurrent active tenants
* graph latency versus message size
* How throughput scales as you scale up the number of assorted zaqar 
components. If one of the benefits of zaqar is its horizontal scalability, lets 
see it.
* How does this change with message batching?
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-12 Thread Joe Gordon

On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths <
kurt.griffi...@rackspace.com> wrote:

> Hi folks,
>
> In this second round of performance testing, I benchmarked the new Redis
> driver. I used the same setup and tests as in Round 1 to make it easier to
> compare the two drivers. I did not test Redis in master-slave mode, but
> that likely would not make a significant difference in the results since
> Redis replication is asynchronous[1].
>
> As always, the usual benchmarking disclaimers apply (i.e., take these
> numbers with a grain of salt; they are only intended to provide a ballpark
> reference; you should perform your own tests, simulating your specific
> scenarios and using your own hardware; etc.).
>
> ## Setup ##
>
> Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to
> mitigate noisy neighbor when running the performance tests:
>
> * 1x Load Generator
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar-bench
> * 1x Web Head
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar server
> * storage=mongodb
> * partitions=4
> * MongoDB URI configured with w=majority
> * uWSGI + gevent
> * config: http://paste.openstack.org/show/100592/
> * app.py: http://paste.openstack.org/show/100593/
> * 3x MongoDB Nodes
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * mongod 2.6.4
> * Default config, except setting replSet and enabling periodic
>   logging of CPU and I/O
> * Journaling enabled
> * Profiling on message DBs enabled for requests over 10ms
> * 1x Redis Node
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * Redis 2.4.14
> * Default config (snapshotting and AOF enabled)
> * One process
>
> As in Round 1, Keystone auth is disabled and requests go over HTTP, not
> HTTPS. The latency introduced by enabling these is outside the control of
> Zaqar, but should be quite minimal (speaking anecdotally, I would expect
> an additional 1-3ms for cached tokens and assuming an optimized TLS
> termination setup).
>
> For generating the load, I again used the zaqar-bench tool. I would like
> to see the team complete a large-scale Tsung test as well (including a
> full HA deployment with Keystone and HTTPS enabled), but decided not to
> wait for that before publishing the results for the Redis driver using
> zaqar-bench.
>
> CPU usage on the Redis node peaked at around 75% for the one process. To
> better utilize the hardware, a production deployment would need to run
> multiple Redis processes and use Zaqar's backend pooling feature to
> distribute queues across the various instances.
>
> Several different messaging patterns were tested, taking inspiration
> from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar)
>
> Each test was executed three times and the best time recorded.
>
> A ~1K sample message (1398 bytes) was used for all tests.
>
> ## Results ##
>
> ### Event Broadcasting (Read-Heavy) ###
>
> OK, so let's say you have a somewhat low-volume source, but tons of event
> observers. In this case, the observers easily outpace the producer, making
> this a read-heavy workload.
>
> Options
> * 1 producer process with 5 gevent workers
> * 1 message posted per request
> * 2 observer processes with 25 gevent workers each
> * 5 messages listed per request by the observers
> * Load distributed across 4[6] queues
> * 10-second duration
>

10 seconds is way too short


>
> Results
> * Redis
> * Producer: 1.7 ms/req,  585 req/sec
> * Observer: 1.5 ms/req, 1254 req/sec
> * Mongo
> * Producer: 2.2 ms/req,  454 req/sec
> * Observer: 1.5 ms/req, 1224 req/sec


If zaqar is like amazon SQS, then the latency for a single message and the
throughput for a single tenant is not important. I wouldn't expect anyone
who has latency sensitive work loads or needs massive throughput to use
zaqar, as these people wouldn't use SQS either. The consistency of the
latency (shouldn't change under load) and zaqar's ability to scale
horizontally mater much more. What I would be great to see some other
things benchmarked instead:

* graph latency versus number of concurrent active tenants
* graph latency versus message size
* How throughput scales as you scale up the number of assorted za

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-12 Thread Flavio Percoco

On 09/12/2014 01:36 AM, Boris Pavlovic wrote:
> Kurt, 
> 
> Speaking generally, I’d like to see the project bake this in over
> time as
> part of the CI process. It’s definitely useful information not just for
> the developers but also for operators in terms of capacity planning.
> We’ve  
> 
> talked as a team about doing this with Rally  (and in fact, some
> work has
> 
> been started there), but it may be useful to also run a large-scale
> test 
> 
> on a regular basis (at least per milestone). 
> 
> 
> I believe, we will be able to generate distributed load and generate at
> least
> 20k rps in K cycle. We've done a lot of work during J in this direction,
> but there is still a lot of to do.
> 
> So you'll be able to use the same tool for gates, local usage and
> large-scale tests.


Lets talk about it :)

Would it be possible to get an update from you at the summit (or mailing
list)? I'm interested to know where you guys are with this, what is
missing and most importantly, how we can help.

Thanks Boris,
Flavio


-- 
@flaper87
Flavio Percoco

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-11 Thread Boris Pavlovic

Kurt,

Speaking generally, I’d like to see the project bake this in over time as
> part of the CI process. It’s definitely useful information not just for
> the developers but also for operators in terms of capacity planning. We’ve
>

talked as a team about doing this with Rally  (and in fact, some work has

been started there), but it may be useful to also run a large-scale test

on a regular basis (at least per milestone).


I believe, we will be able to generate distributed load and generate at
least
20k rps in K cycle. We've done a lot of work during J in this direction,
but there is still a lot of to do.

So you'll be able to use the same tool for gates, local usage and
large-scale tests.

Best regards,
Boris Pavlovic



On Fri, Sep 12, 2014 at 3:17 AM, Kurt Griffiths <
kurt.griffi...@rackspace.com> wrote:

> On 9/11/14, 2:11 PM, "Devananda van der Veen" 
> wrote:
>
> >OK - those resource usages sound better. At least you generated enough
> >load to saturate the uWSGI process CPU, which is a good point to look
> >at performance of the system.
> >
> >At that peak, what was the:
> >- average msgs/sec
> >- min/max/avg/stdev time to [post|get|delete] a message
>
> To be honest, it was a quick test and I didn’t note the exact metrics
> other than eyeballing them to see that they were similar to the results
> that I published for the scenarios that used the same load options (e.g.,
> I just re-ran some of the same test scenarios).
>
> Some of the metrics you mention aren’t currently reported by zaqar-bench,
> but could be added easily enough. In any case, I think zaqar-bench is
> going to end up being mostly useful to track relative performance gains or
> losses on a patch-by-patch basis, and also as an easy way to smoke-test
> both python-marconiclient and the service. For large-scale testing and
> detailed metrics, other tools (e.g., Tsung, JMeter) are better for the
> job, so I’ve been considering using them in future rounds.
>
> >Is that 2,181 msg/sec total, or per-producer?
>
> That metric was a combined average rate for all producers.
>
> >
> >I'd really like to see the total throughput and latency graphed as #
> >of clients increases. Or if graphing isn't your thing, even just post
> >a .csv of the raw numbers and I will be happy to graph it.
> >
> >It would also be great to see how that scales as you add more Redis
> >instances until all the available CPU cores on your Redis host are in
> >Use.
>
> Yep, I’ve got a long list of things like this that I’d like to see in
> future rounds of performance testing (and I welcome anyone in the
> community with an interest to join in), but I have to balance that effort
> with a lot of other things that are on my plate right now.
>
> Speaking generally, I’d like to see the project bake this in over time as
> part of the CI process. It’s definitely useful information not just for
> the developers but also for operators in terms of capacity planning. We’ve
> talked as a team about doing this with Rally (and in fact, some work has
> been started there), but it may be useful to also run a large-scale test
> on a regular basis (at least per milestone). Regardless, I think it would
> be great for the Zaqar team to connect with other projects (at the
> summit?) who are working on perf testing to swap ideas, collaborate on
> code/tools, etc.
>
> --KG
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-11 Thread Kurt Griffiths

On 9/11/14, 2:11 PM, "Devananda van der Veen" 
wrote:

>OK - those resource usages sound better. At least you generated enough
>load to saturate the uWSGI process CPU, which is a good point to look
>at performance of the system.
>
>At that peak, what was the:
>- average msgs/sec
>- min/max/avg/stdev time to [post|get|delete] a message

To be honest, it was a quick test and I didn’t note the exact metrics
other than eyeballing them to see that they were similar to the results
that I published for the scenarios that used the same load options (e.g.,
I just re-ran some of the same test scenarios).

Some of the metrics you mention aren’t currently reported by zaqar-bench,
but could be added easily enough. In any case, I think zaqar-bench is
going to end up being mostly useful to track relative performance gains or
losses on a patch-by-patch basis, and also as an easy way to smoke-test
both python-marconiclient and the service. For large-scale testing and
detailed metrics, other tools (e.g., Tsung, JMeter) are better for the
job, so I’ve been considering using them in future rounds.

>Is that 2,181 msg/sec total, or per-producer?

That metric was a combined average rate for all producers.

>
>I'd really like to see the total throughput and latency graphed as #
>of clients increases. Or if graphing isn't your thing, even just post
>a .csv of the raw numbers and I will be happy to graph it.
>
>It would also be great to see how that scales as you add more Redis
>instances until all the available CPU cores on your Redis host are in
>Use.

Yep, I’ve got a long list of things like this that I’d like to see in
future rounds of performance testing (and I welcome anyone in the
community with an interest to join in), but I have to balance that effort
with a lot of other things that are on my plate right now.

Speaking generally, I’d like to see the project bake this in over time as
part of the CI process. It’s definitely useful information not just for
the developers but also for operators in terms of capacity planning. We’ve
talked as a team about doing this with Rally (and in fact, some work has
been started there), but it may be useful to also run a large-scale test
on a regular basis (at least per milestone). Regardless, I think it would
be great for the Zaqar team to connect with other projects (at the
summit?) who are working on perf testing to swap ideas, collaborate on
code/tools, etc.

--KG

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-11 Thread Devananda van der Veen

On Wed, Sep 10, 2014 at 6:09 PM, Kurt Griffiths
 wrote:
> On 9/10/14, 3:58 PM, "Devananda van der Veen" 
> wrote:
>
>>I'm going to assume that, for these benchmarks, you configured all the
>>services optimally.
>
> Sorry for any confusion; I am not trying to hide anything about the setup.
> I thought I was pretty transparent about the way uWSGI, MongoDB, and Redis
> were configured. I tried to stick to mostly default settings to keep
> things simple, making it easier for others to reproduce/verify the results.
>
> Is there further information about the setup that you were curious about
> that I could provide? Was there a particular optimization that you didn’t
> see that you would recommend?
>

Nope.

>>I'm not going to question why you didn't run tests
>>with tens or hundreds of concurrent clients,
>
> If you review the different tests, you will note that a couple of them
> used at least 100 workers. That being said, I think we ought to try higher
> loads in future rounds of testing.
>

Perhaps I misunderstand what "2 processes with 25 gevent workers"
means - I think this means you have two _processes_ which are using
greenthreads and eventlet, and so each of those two python processes
is swapping between 25 coroutines. From a load generation standpoint,
this is not the same as having 100 concurrent client _processes_.

>>or why you only ran the
>>tests for 10 seconds.
>
> In Round 1 I did mention that i wanted to do a followup with a longer
> duration. However, as I alluded to in the preamble for Round 2, I kept
> things the same for the redis tests to compare with the mongo ones done
> previously.
>
> We’ll increase the duration in the next round of testing.
>

Sure - consistency between tests is good. But I don't believe that a
10-second benchmark is ever enough to suss out service performance.
Lots of things only appear after high load has been applied for a
period of time as eg. caches fill up, though this leads to my next
point below...

>>Instead, I'm actually going to question how it is that, even with
>>relatively beefy dedicated hardware (128 GB RAM in your storage
>>nodes), Zaqar peaked at around 1,200 messages per second.
>
> I went back and ran some of the tests and never saw memory go over ~20M
> (as observed with redis-top) so these same results should be obtainable on
> a box with a lot less RAM.

Whoa. So, that's a *really* important piece of information which was,
afaict, missing from your previous email(s). I hope you can understand
how, with the information you provided ("the Redis server has 128GB
RAM") I was shocked at the low performance.

> Furthermore, the tests only used 1 CPU on the
> Redis host, so again, similar results should be achievable on a much more
> modest box.

You described fairy beefy hardware but didn't utilize it fully -- I
was expecting your performance test to attempt to stress the various
components of a Zaqar installation and, at least in some way, attempt
to demonstrate what the capacity of a Zaqar deployment might be on the
hardware you have available. Thus my surprise at the low numbers. If
that wasn't your intent (and given the CPU/RAM usage your tests
achieved, it's not what you achieved) then my disappointment in those
performance numbers is unfounded.

But I hope you can understand, if I'm looking at a service benchmark
to gauge how well that service might perform in production, seeing
expensive hardware perform disappointingly slowly is not a good sign.

>
> FWIW, I went back and ran a couple scenarios to get some more data points.
> First, I did one with 50 producers and 50 observers. In that case, the
> single CPU on which the OS scheduled the Redis process peaked at 30%. The
> second test I did was with 50 producers + 5 observers + 50 consumers
> (which claim messages and delete them rather than simply page through
> them). This time, Redis used 78% of its CPU. I suppose this should not be
> surprising because the consumers do a lot more work than the observers.
> Meanwhile, load on the web head was fairly high; around 80% for all 20
> CPUs. This tells me that python and/or uWSGI are working pretty hard to
> serve these requests, and there may be some opportunities to optimize that
> layer. I suspect there are also some opportunities to reduce the number of
> Redis operations and roundtrips required to claim a batch of messages.
>

OK - those resource usages sound better. At least you generated enough
load to saturate the uWSGI process CPU, which is a good point to look
at performance of the system.

At that peak, what was the:
- average msgs/sec
- min/max/avg/stdev time to [post|get|delete] a message

> The other thing to consider is that in these first two rounds I did not
> test increasing amounts of load (number of clients performing concurrent
> requests) and graph that against latency and throughput. Out of curiosity,
> I just now did a quick test to compare the messages enqueued with 50
> producers + 5 observers + 50 consumers vs. adding anot

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-10 Thread Kurt Griffiths

On 9/10/14, 3:58 PM, "Devananda van der Veen" 
wrote:

>I'm going to assume that, for these benchmarks, you configured all the
>services optimally.

Sorry for any confusion; I am not trying to hide anything about the setup.
I thought I was pretty transparent about the way uWSGI, MongoDB, and Redis
were configured. I tried to stick to mostly default settings to keep
things simple, making it easier for others to reproduce/verify the results.

Is there further information about the setup that you were curious about
that I could provide? Was there a particular optimization that you didn’t
see that you would recommend?

>I'm not going to question why you didn't run tests
>with tens or hundreds of concurrent clients,

If you review the different tests, you will note that a couple of them
used at least 100 workers. That being said, I think we ought to try higher
loads in future rounds of testing.

>or why you only ran the
>tests for 10 seconds.

In Round 1 I did mention that i wanted to do a followup with a longer
duration. However, as I alluded to in the preamble for Round 2, I kept
things the same for the redis tests to compare with the mongo ones done
previously.

We’ll increase the duration in the next round of testing.

>Instead, I'm actually going to question how it is that, even with
>relatively beefy dedicated hardware (128 GB RAM in your storage
>nodes), Zaqar peaked at around 1,200 messages per second.

I went back and ran some of the tests and never saw memory go over ~20M
(as observed with redis-top) so these same results should be obtainable on
a box with a lot less RAM. Furthermore, the tests only used 1 CPU on the
Redis host, so again, similar results should be achievable on a much more
modest box.

FWIW, I went back and ran a couple scenarios to get some more data points.
First, I did one with 50 producers and 50 observers. In that case, the
single CPU on which the OS scheduled the Redis process peaked at 30%. The
second test I did was with 50 producers + 5 observers + 50 consumers
(which claim messages and delete them rather than simply page through
them). This time, Redis used 78% of its CPU. I suppose this should not be
surprising because the consumers do a lot more work than the observers.
Meanwhile, load on the web head was fairly high; around 80% for all 20
CPUs. This tells me that python and/or uWSGI are working pretty hard to
serve these requests, and there may be some opportunities to optimize that
layer. I suspect there are also some opportunities to reduce the number of
Redis operations and roundtrips required to claim a batch of messages.

The other thing to consider is that in these first two rounds I did not
test increasing amounts of load (number of clients performing concurrent
requests) and graph that against latency and throughput. Out of curiosity,
I just now did a quick test to compare the messages enqueued with 50
producers + 5 observers + 50 consumers vs. adding another 50 producer
clients and found that the producers were able to post 2,181 messages per
second while giving up only 0.3 ms.

--KG

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-10 Thread Devananda van der Veen

On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths
 wrote:
> Hi folks,
>
> In this second round of performance testing, I benchmarked the new Redis
> driver. I used the same setup and tests as in Round 1 to make it easier to
> compare the two drivers. I did not test Redis in master-slave mode, but
> that likely would not make a significant difference in the results since
> Redis replication is asynchronous[1].
>
> As always, the usual benchmarking disclaimers apply (i.e., take these
> numbers with a grain of salt; they are only intended to provide a ballpark
> reference; you should perform your own tests, simulating your specific
> scenarios and using your own hardware; etc.).
>
> ## Setup ##
>
> Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to
> mitigate noisy neighbor when running the performance tests:
>
> * 1x Load Generator
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar-bench
> * 1x Web Head
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar server
> * storage=mongodb
> * partitions=4
> * MongoDB URI configured with w=majority
> * uWSGI + gevent
> * config: http://paste.openstack.org/show/100592/
> * app.py: http://paste.openstack.org/show/100593/
> * 3x MongoDB Nodes
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * mongod 2.6.4
> * Default config, except setting replSet and enabling periodic
>   logging of CPU and I/O
> * Journaling enabled
> * Profiling on message DBs enabled for requests over 10ms
> * 1x Redis Node
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * Redis 2.4.14
> * Default config (snapshotting and AOF enabled)
> * One process
>
> As in Round 1, Keystone auth is disabled and requests go over HTTP, not
> HTTPS. The latency introduced by enabling these is outside the control of
> Zaqar, but should be quite minimal (speaking anecdotally, I would expect
> an additional 1-3ms for cached tokens and assuming an optimized TLS
> termination setup).
>
> For generating the load, I again used the zaqar-bench tool. I would like
> to see the team complete a large-scale Tsung test as well (including a
> full HA deployment with Keystone and HTTPS enabled), but decided not to
> wait for that before publishing the results for the Redis driver using
> zaqar-bench.
>
> CPU usage on the Redis node peaked at around 75% for the one process. To
> better utilize the hardware, a production deployment would need to run
> multiple Redis processes and use Zaqar's backend pooling feature to
> distribute queues across the various instances.
>
> Several different messaging patterns were tested, taking inspiration
> from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar)
>
> Each test was executed three times and the best time recorded.
>
> A ~1K sample message (1398 bytes) was used for all tests.
>
> ## Results ##
>
> ### Event Broadcasting (Read-Heavy) ###
>
> OK, so let's say you have a somewhat low-volume source, but tons of event
> observers. In this case, the observers easily outpace the producer, making
> this a read-heavy workload.
>
> Options
> * 1 producer process with 5 gevent workers
> * 1 message posted per request
> * 2 observer processes with 25 gevent workers each
> * 5 messages listed per request by the observers
> * Load distributed across 4[6] queues
> * 10-second duration
>
> Results
> * Redis
> * Producer: 1.7 ms/req,  585 req/sec
> * Observer: 1.5 ms/req, 1254 req/sec
> * Mongo
> * Producer: 2.2 ms/req,  454 req/sec
> * Observer: 1.5 ms/req, 1224 req/sec
>
> ### Event Broadcasting (Balanced) ###
>
> This test uses the same number of producers and consumers, but note that
> the observers are still listing (up to) 5 messages at a time[4], so they
> still outpace the producers, but not as quickly as before.
>
> Options
> * 2 producer processes with 25 gevent workers each
> * 1 message posted per request
> * 2 observer processes with 25 gevent workers each
> * 5 messages listed per request by the observers
> * Load distributed across 4 queues
> * 10-second duration
>
> Results
> * Redis
> * Producer: 1.4 ms/req, 1374 req/sec
> * Observer: 1.6 ms/req, 1178 req/sec
> * Mongo
>

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-10 Thread Kurt Griffiths

Thanks! Looks good. Only thing I noticed was that footnotes were still
referenced, but did not appear at the bottom of the page.

On 9/10/14, 6:16 AM, "Flavio Percoco"  wrote:

>I've collected the information from both performance tests and put it in
>the project's wiki[0] Please, double check :D

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-10 Thread Flavio Percoco

On 09/09/2014 09:19 PM, Kurt Griffiths wrote:
> Hi folks,
> 
> In this second round of performance testing, I benchmarked the new Redis
> driver. I used the same setup and tests as in Round 1 to make it easier to
> compare the two drivers. I did not test Redis in master-slave mode, but
> that likely would not make a significant difference in the results since
> Redis replication is asynchronous[1].
> 
> As always, the usual benchmarking disclaimers apply (i.e., take these
> numbers with a grain of salt; they are only intended to provide a ballpark
> reference; you should perform your own tests, simulating your specific
> scenarios and using your own hardware; etc.).
> 
> ## Setup ##
> 
> Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to
> mitigate noisy neighbor when running the performance tests:
> 
> * 1x Load Generator
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar-bench
> * 1x Web Head
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar server
> * storage=mongodb
> * partitions=4
> * MongoDB URI configured with w=majority
> * uWSGI + gevent
> * config: http://paste.openstack.org/show/100592/
> * app.py: http://paste.openstack.org/show/100593/
> * 3x MongoDB Nodes
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * mongod 2.6.4
> * Default config, except setting replSet and enabling periodic
>   logging of CPU and I/O
> * Journaling enabled
> * Profiling on message DBs enabled for requests over 10ms
> * 1x Redis Node
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * Redis 2.4.14
> * Default config (snapshotting and AOF enabled)
> * One process
> 
> As in Round 1, Keystone auth is disabled and requests go over HTTP, not
> HTTPS. The latency introduced by enabling these is outside the control of
> Zaqar, but should be quite minimal (speaking anecdotally, I would expect
> an additional 1-3ms for cached tokens and assuming an optimized TLS
> termination setup).
> 
> For generating the load, I again used the zaqar-bench tool. I would like
> to see the team complete a large-scale Tsung test as well (including a
> full HA deployment with Keystone and HTTPS enabled), but decided not to
> wait for that before publishing the results for the Redis driver using
> zaqar-bench.
> 
> CPU usage on the Redis node peaked at around 75% for the one process. To
> better utilize the hardware, a production deployment would need to run
> multiple Redis processes and use Zaqar's backend pooling feature to
> distribute queues across the various instances.
> 
> Several different messaging patterns were tested, taking inspiration
> from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar)
> 
> Each test was executed three times and the best time recorded.
> 
> A ~1K sample message (1398 bytes) was used for all tests.
> 
> ## Results ##
> 
> ### Event Broadcasting (Read-Heavy) ###
> 
> OK, so let's say you have a somewhat low-volume source, but tons of event
> observers. In this case, the observers easily outpace the producer, making
> this a read-heavy workload.
> 
> Options
> * 1 producer process with 5 gevent workers
> * 1 message posted per request
> * 2 observer processes with 25 gevent workers each
> * 5 messages listed per request by the observers
> * Load distributed across 4[6] queues
> * 10-second duration
> 
> Results
> * Redis
> * Producer: 1.7 ms/req,  585 req/sec
> * Observer: 1.5 ms/req, 1254 req/sec
> * Mongo
> * Producer: 2.2 ms/req,  454 req/sec
> * Observer: 1.5 ms/req, 1224 req/sec
> 
> ### Event Broadcasting (Balanced) ###
> 
> This test uses the same number of producers and consumers, but note that
> the observers are still listing (up to) 5 messages at a time[4], so they
> still outpace the producers, but not as quickly as before.
> 
> Options
> * 2 producer processes with 25 gevent workers each
> * 1 message posted per request
> * 2 observer processes with 25 gevent workers each
> * 5 messages listed per request by the observers
> * Load distributed across 4 queues
> * 10-second duration
> 
> Results
> * Redis
> * Producer: 1.4 ms/req, 1374 req/sec
> * Observer: 1.6 ms/req, 1178 req/sec
>

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

12 matches

Site Navigation

Mail list logo

Footer information