map-reduce or spark can work. On Sun, Dec 3, 2017 at 9:13 AM, Adam Sylvester <[email protected]> wrote:
> I have a use case where my Scheduler gets an externally-generated request > to produce an image. This is a CPU-intensive task that I can divide up > into, say, 20 largely independent jobs, and I have an application which can > take in the input filename and which slot out of the 20 it is and produce > 1/20th of the output image. Each job runs on its own machine, using all > CPUs and memory on the machine. The final output image isn't finished > until all 20 jobs are complete, so I don't want to send an external 'job > complete' message until these 20 jobs all finish. > > I can do this in Mesos by accepting 20 resource offers and launching tasks > on them, where each task says it needs all resources on the machine, then > doing bookkeeping on the Scheduler as tasks complete to keep track of when > all 20 finish, at which point I can send my external job complete message. > > This is all doable, but there are some obvious complications here (for > example, if any of the 20 jobs fail, I want to fail all 20 jobs, but I have > to keep track of that myself). > > AWS Batch has Array Jobs which would give me the kind of functionality I > want (http://docs.aws.amazon.com/batch/latest/userguide/array_jobs.html). > I'm wondering if there's any way to do this - specifically running a single > logical task across multiple machines - using either Mesos or an additional > framework that lives on top of Mesos. > > Thanks. > -Adam >

