Re: Multi-machine jobs

Mohit Jaggi Sun, 03 Dec 2017 10:08:14 -0800

map-reduce or spark can work.

On Sun, Dec 3, 2017 at 9:13 AM, Adam Sylvester <[email protected]> wrote:


> I have a use case where my Scheduler gets an externally-generated request
> to produce an image.  This is a CPU-intensive task that I can divide up
> into, say, 20 largely independent jobs, and I have an application which can
> take in the input filename and which slot out of the 20 it is and produce
> 1/20th of the output image.  Each job runs on its own machine, using all
> CPUs and memory on the machine.  The final output image isn't finished
> until all 20 jobs are complete, so I don't want to send an external 'job
> complete' message until these 20 jobs all finish.
>
> I can do this in Mesos by accepting 20 resource offers and launching tasks
> on them, where each task says it needs all resources on the machine, then
> doing bookkeeping on the Scheduler as tasks complete to keep track of when
> all 20 finish, at which point I can send my external job complete message.
>
> This is all doable, but there are some obvious complications here (for
> example, if any of the 20 jobs fail, I want to fail all 20 jobs, but I have
> to keep track of that myself).
>
> AWS Batch has Array Jobs which would give me the kind of functionality I
> want (http://docs.aws.amazon.com/batch/latest/userguide/array_jobs.html).
> I'm wondering if there's any way to do this - specifically running a single
> logical task across multiple machines - using either Mesos or an additional
> framework that lives on top of Mesos.
>
> Thanks.
> -Adam
>

Re: Multi-machine jobs

Reply via email to