Hi all,

In the past weeks, I've been thinking in leveraging Mesos to schedule 
distributed load tests. Recently, the Kubernetes community recently shared one 
way to accomplish this (here: 
https://cloud.google.com/solutions/distributed-load-testing-using-kubernetes). 

One problem, at least for me, with this approach is that the load testing tool 
needs to coordinate the distributed scenario, and combine the data, if it 
doesn't, then the load clients will trigger at different times, and then later 
an aggregation step of the data would be handled by the user, or some external 
batch job, or script. This is not a problem for load generators like Tsung, or 
Locust, but could be a little more complicated for Gatling, since they already 
provide a distributed model, and coordinate the distributed tasks, and Gatling 
does not. To me, the approach the Kubernetes team suggests is really a hack 
using the 'ReplicationController' to spawn multiple replicas, which could be 
easily achieved using the same approach with Marathon (or Kubernetes on Mesos).

I was thinking of building a Mesos framework, that would take the input, or 
load simulation file, and would schedule jobs across the cluster (perhaps with 
dedicated resources too minimize variance) using Gatling.  A Mesos framework 
will be able to provide a UI/API to take the input jobs, and report status of 
multiple jobs. It can also provide a way to sync/orchestrate the simulation, 
and finally provide a way to aggregate the simulation data in one place, and 
serve the generated HTML report.

Boiled down to its primitive parts, it would spin multiple Gatling (java) 
processes across the cluster, use something like a barrier (not sure what to 
use here) to wait for all processes to be ready to execute, and finally copy, 
and rename the generated simulations logs from each Gatling process to one 
node/place, that is finally aggregated and compiled to HTML report by a single 
Gatling process.

First of all, is there anything in the Mesos community that does this already? 
If not, do you think this is feasible to accomplish with a Mesos framework, and 
would you recommend to go with this approach? Does Mesos offers a barrier-like 
features to coordinate jobs, and can I somehow move files to a single node to 
be processed?

Finally, I've never written a non-trivial Mesos framework, how should I go 
about, or find more documentation, to get started? I'm looking for best 
practices, pitfalls, etc.


Thank you for your time,
Carlos


Reply via email to