Re: COMMERCIAL:Re: [Question] Distributed Load Testing with Mesos and Gatling

2015-07-02 Thread CCAAT

On 07/02/2015 12:10 PM, Carlos Torres wrote:


From: CCAAT cc...@tampabay.rr.com
Sent: Thursday, July 2, 2015 12:00 PM
To: user@mesos.apache.org
Cc: cc...@tampabay.rr.com
Subject: COMMERCIAL:Re: [Question] Distributed Load Testing with Mesos and 
Gatling

On 07/01/2015 01:17 PM, Carlos Torres wrote:

Hi all,

In the past weeks, I've been thinking in leveraging Mesos to schedule 
distributed load tests.


An excellent idea.


One problem, at least for me, with this approach is that the load testing tool 
needs to coordinate
the distributed scenario, and combine the data, if it doesn't, then the load 
clients will trigger at
different times, and then later an aggregation step of the data would be 
handled by the user, or
some external batch job, or script. This is not a problem for load generators 
like Tsung, or Locust,
but could be a little more complicated for Gatling, since they already provide 
a distributed model,
and coordinate the distributed tasks, and Gatling does not. To me, the approach 
the Kubernetes team
suggests is really a hack using the 'Replication Controller' to spawn multiple 
replicas, which could
be easily achieved using the same approach with Marathon (or Kubernetes on 
Mesos).



I was thinking of building a Mesos framework, that would take the input, or 
load simulation file,
and would schedule jobs across the cluster (perhaps with dedicated resources 
too minimize variance)
using Gatling.  A Mesos framework will be able to provide a UI/API to take the 
input jobs, and
report status of multiple jobs. It can also provide a way to sync/orchestrate 
the simulation, and
finally provide a way to aggregate the simulation data in one place, and serve 
the generated HTML
report.



Boiled down to its primitive parts, it would spin multiple Gatling (java) 
processes across the
cluster, use something like a barrier (not sure what to use here) to wait for 
all processes to
be ready to execute, and finally copy, and rename the generated simulations 
logs from each
Gatling process to one node/place, that is finally aggregated and compiled to 
HTML report by a
single Gatling process.



First of all, is there anything in the Mesos community that does this already? 
If not, do you
think this is feasible to accomplish with a Mesos framework, and would you 
recommend to go with this
approach? Does Mesos offers a barrier-like features to coordinate jobs, and can 
I somehow move
files to a single node to be processed?


This all sounds workable, but, I do not have all the experiences
necessary to qualify your ideas. What I would suggest is a solution that
lends itself to testing similarly configured cloud/cluster offerings, so
we the cloud/cluster community has a way to test and evaluate   new
releases, substitute component codes, forks and even competitive
offerings. A ubiquitous  and robust testing semantic based on your ideas
does seem to be an overwhelmingly positive idea, imho. As such some
organizational structures to allow results to be maintained and quickly
compared to other 'test-runs' would greatly encourage usage.
Hopefully 'Gatling' and such have many, if not most of the features
needed to automate the evaluation of results.



Finally, I've never written a non-trivial Mesos framework, how should I go 
about, or find more
documentation, to get started? I'm looking for best practices, pitfalls, etc.


Thank you for your time,
Carlos


hth,
James


Thanks for your feedback.

I like your idea about having the ability to swap out the different components 
(e.g. load generators) and perhaps even providing an abstraction on the 
charting, and data reporting mechanism.

I'll probably start with the simplest way possible, though, having the 
framework deploy Gatling across the cluster, in a scale-out fashion, and 
retrieve each instance results. Once I got that working then I'll start 
experimenting with abstracting out certain functionality.

I know Twitter has a distributed load generator, called Iago, that apparently 
works in Mesos, it'd be awesome, if any of its contributors chime in, and share 
what things worked great, good, and not so good.


The few things I'm concern in terms of implementing such a framework in Mesos 
is:

* Noisy neighbors, or resource isolation.
 - Rationale: It can introduce noise to the results if load generator 
competes for shared resources (e.g. network) with others tasks.

* Coordination of execution
 - Rationale: Need the ability to control execution of groups of related tasks. User 
A submits simulation that might create 5 load clients (tasks?), right after that, User B 
submits a different simulation that creates 10 load clients. Ideally, all of User A load 
clients should be on independent nodes, and should not share the same slaves with User B 
load clients, if not enough slaves are available on the cluster, then User B's simulation 
queues, until slaves are available. There might be enough resources

Re: COMMERCIAL:Re: [Question] Distributed Load Testing with Mesos and Gatling

2015-07-02 Thread Carlos Torres
Yes, I agree, I think starting out with the scale-out approach, while naive, it 
will be a good starting point.


I actually have this automated with Jenkins, and a bunch of dedicated slaves, 
using the Workflow plugin, it works kind of OK since I can't really control 
their execution.


If you are interested, here's my workflow script for Jenkins: 
https://github.com/meteorfox/gatling-workflow/blob/master/gatling_flow.groovy


-- Carlos


From: Joao Ribeiro jonnyb...@gmail.com
Sent: Thursday, July 2, 2015 11:33 AM
To: user@mesos.apache.org
Subject: COMMERCIAL:Re: [Question] Distributed Load Testing with Mesos and 
Gatling

This sounds like a really cool project.

I am still a very green user of mesos and never used gatling at all but a quick 
search took me to http://gatling.io/docs/2.1.6/cookbook/scaling_out.html

With this it sound’t be took difficult to create a master/slave or 
scheduler/executors approach where you would have the master launch several 
slaves to do the work, wait for it to finish, collect logs and generate the 
report.
For better synchronisation you could make the slaves register to zookeeper 
while master waits for all slaves to be up and trigger a “start test” command 
on all slaves simultaneously.
You then could easily time out if it takes too long to get all slaves up or use 
other more fault tolerant strategies. i.e.: run slaves that you got; bump each 
slave that is up with more load to try to make up for missing slaves;

It might be a naive approach but would be a starting point in my opinion.

On 02 Jul 2015, at 18:00, CCAAT 
cc...@tampabay.rr.commailto:cc...@tampabay.rr.com wrote:

On 07/01/2015 01:17 PM, Carlos Torres wrote:
Hi all,

In the past weeks, I've been thinking in leveraging Mesos to schedule 
distributed load tests.

An excellent idea.

One problem, at least for me, with this approach is that the load testing tool 
needs to coordinate
the distributed scenario, and combine the data, if it doesn't, then the load 
clients will trigger at
different times, and then later an aggregation step of the data would be 
handled by the user, or
some external batch job, or script. This is not a problem for load generators 
like Tsung, or Locust,
but could be a little more complicated for Gatling, since they already provide 
a distributed model,
and coordinate the distributed tasks, and Gatling does not. To me, the approach 
the Kubernetes team
suggests is really a hack using the 'Replication Controller' to spawn multiple 
replicas, which could
be easily achieved using the same approach with Marathon (or Kubernetes on 
Mesos).

I was thinking of building a Mesos framework, that would take the input, or 
load simulation file,
and would schedule jobs across the cluster (perhaps with dedicated resources 
too minimize variance)
using Gatling.  A Mesos framework will be able to provide a UI/API to take the 
input jobs, and
report status of multiple jobs. It can also provide a way to sync/orchestrate 
the simulation, and
finally provide a way to aggregate the simulation data in one place, and serve 
the generated HTML
report.

Boiled down to its primitive parts, it would spin multiple Gatling (java) 
processes across the
cluster, use something like a barrier (not sure what to use here) to wait for 
all processes to
be ready to execute, and finally copy, and rename the generated simulations 
logs from each
Gatling process to one node/place, that is finally aggregated and compiled to 
HTML report by a
single Gatling process.

First of all, is there anything in the Mesos community that does this already? 
If not, do you
think this is feasible to accomplish with a Mesos framework, and would you 
recommend to go with this
approach? Does Mesos offers a barrier-like features to coordinate jobs, and can 
I somehow move
files to a single node to be processed?

This all sounds workable, but, I do not have all the experiences necessary to 
qualify your ideas. What I would suggest is a solution that lends itself to 
testing similarly configured cloud/cluster offerings, so we the cloud/cluster 
community has a way to test and evaluate   new releases, substitute component 
codes, forks and even competitive offerings. A ubiquitous  and robust testing 
semantic based on your ideas does seem to be an overwhelmingly positive idea, 
imho. As such some organizational structures to allow results to be maintained 
and quickly compared to other 'test-runs' would greatly encourage usage.
Hopefully 'Gatling' and such have many, if not most of the features needed to 
automate the evaluation of results.


Finally, I've never written a non-trivial Mesos framework, how should I go 
about, or find more
documentation, to get started? I'm looking for best practices, pitfalls, etc.


Thank you for your time,
Carlos

hth,
James