________________________________________
From: CCAAT <[email protected]>
Sent: Thursday, July 2, 2015 12:00 PM
To: [email protected]
Cc: [email protected]
Subject: COMMERCIAL:Re: [Question] Distributed Load Testing with Mesos and 
Gatling

On 07/01/2015 01:17 PM, Carlos Torres wrote:
> Hi all,
>
> In the past weeks, I've been thinking in leveraging Mesos to schedule 
> distributed load tests.

An excellent idea.
>
> One problem, at least for me, with this approach is that the load testing 
> tool needs to coordinate
> the distributed scenario, and combine the data, if it doesn't, then the load 
> clients will trigger at
> different times, and then later an aggregation step of the data would be 
> handled by the user, or
> some external batch job, or script. This is not a problem for load generators 
> like Tsung, or Locust,
> but could be a little more complicated for Gatling, since they already 
> provide a distributed model,
> and coordinate the distributed tasks, and Gatling does not. To me, the 
> approach the Kubernetes team
> suggests is really a hack using the 'Replication Controller' to spawn 
> multiple replicas, which could
> be easily achieved using the same approach with Marathon (or Kubernetes on 
> Mesos).

> I was thinking of building a Mesos framework, that would take the input, or 
> load simulation file,
> and would schedule jobs across the cluster (perhaps with dedicated resources 
> too minimize variance)
> using Gatling.  A Mesos framework will be able to provide a UI/API to take 
> the input jobs, and
> report status of multiple jobs. It can also provide a way to sync/orchestrate 
> the simulation, and
> finally provide a way to aggregate the simulation data in one place, and 
> serve the generated HTML
> report.

> Boiled down to its primitive parts, it would spin multiple Gatling (java) 
> processes across the
> cluster, use something like a barrier (not sure what to use here) to wait for 
> all processes to
> be ready to execute, and finally copy, and rename the generated simulations 
> logs from each
> Gatling process to one node/place, that is finally aggregated and compiled to 
> HTML report by a
> single Gatling process.

> First of all, is there anything in the Mesos community that does this 
> already? If not, do you
> think this is feasible to accomplish with a Mesos framework, and would you 
> recommend to go with this
> approach? Does Mesos offers a barrier-like features to coordinate jobs, and 
> can I somehow move
> files to a single node to be processed?

This all sounds workable, but, I do not have all the experiences
necessary to qualify your ideas. What I would suggest is a solution that
lends itself to testing similarly configured cloud/cluster offerings, so
we the cloud/cluster community has a way to test and evaluate   new
releases, substitute component codes, forks and even competitive
offerings. A ubiquitous  and robust testing semantic based on your ideas
does seem to be an overwhelmingly positive idea, imho. As such some
organizational structures to allow results to be maintained and quickly
compared to other 'test-runs' would greatly encourage usage.
Hopefully 'Gatling' and such have many, if not most of the features
needed to automate the evaluation of results.


> Finally, I've never written a non-trivial Mesos framework, how should I go 
> about, or find more
> documentation, to get started? I'm looking for best practices, pitfalls, etc.
>
>
> Thank you for your time,
> Carlos

hth,
James


Thanks for your feedback.

I like your idea about having the ability to swap out the different components 
(e.g. load generators) and perhaps even providing an abstraction on the 
charting, and data reporting mechanism.

I'll probably start with the simplest way possible, though, having the 
framework deploy Gatling across the cluster, in a scale-out fashion, and 
retrieve each instance results. Once I got that working then I'll start 
experimenting with abstracting out certain functionality.

I know Twitter has a distributed load generator, called Iago, that apparently 
works in Mesos, it'd be awesome, if any of its contributors chime in, and share 
what things worked great, good, and not so good.


The few things I'm concern in terms of implementing such a framework in Mesos 
is:

* Noisy neighbors, or resource isolation.
    - Rationale: It can introduce noise to the results if load generator 
competes for shared resources (e.g. network) with others tasks.

* Coordination of execution
    - Rationale: Need the ability to control execution of groups of related 
tasks. User A submits simulation that might create 5 load clients (tasks?), 
right after that, User B submits a different simulation that creates 10 load 
clients. Ideally, all of User A load clients should be on independent nodes, 
and should not share the same slaves with User B load clients, if not enough 
slaves are available on the cluster, then User B's simulation queues, until 
slaves are available. There might be enough "resources" to create, and 
configure, some of User B load clients, but it will block and wait until all of 
its load clients are up and ready.

* Storage 
    - Rationale: Some load generators might need to place certain files 
belonging to a particular simulation in a shared storage location, that is 
independent from other simulations. These files could be common configuration, 
and/or the simulation logs that might need post-processing.

Thanks
Carlos Torres

Reply via email to