I have a use case where my Scheduler gets an externally-generated request to produce an image. This is a CPU-intensive task that I can divide up into, say, 20 largely independent jobs, and I have an application which can take in the input filename and which slot out of the 20 it is and produce 1/20th of the output image. Each job runs on its own machine, using all CPUs and memory on the machine. The final output image isn't finished until all 20 jobs are complete, so I don't want to send an external 'job complete' message until these 20 jobs all finish.
I can do this in Mesos by accepting 20 resource offers and launching tasks on them, where each task says it needs all resources on the machine, then doing bookkeeping on the Scheduler as tasks complete to keep track of when all 20 finish, at which point I can send my external job complete message. This is all doable, but there are some obvious complications here (for example, if any of the 20 jobs fail, I want to fail all 20 jobs, but I have to keep track of that myself). AWS Batch has Array Jobs which would give me the kind of functionality I want (http://docs.aws.amazon.com/batch/latest/userguide/array_jobs.html). I'm wondering if there's any way to do this - specifically running a single logical task across multiple machines - using either Mesos or an additional framework that lives on top of Mesos. Thanks. -Adam

