Hi Stephan,
Thank you very much for those insights. So if I understand it correctly,
the idea here is that the MPI job would be distributed across multiple
Aurora job instances, instead of multiple machines. Also all the MPI jobs
should be scheduled together as one entity (gang scheduling).
One
Hi Santhosh,
Thanks for your response and suggestion. Mesos-hydra is not being used and
supported by the community anymore, from what I heard from Mesos
developers. But certainly it may be a potential reference to build up upon.
My most preferred option would be to use any existing schedulers
Thanks for your response Zameer. I shall check out Apache Aurora and update
if it served the purpose.
On Fri, Oct 14, 2016 at 2:01 PM, Zameer Manji wrote:
> Hey,
>
> I am not an expert on MPI jobs, but it seems possible to run them on
> Aurora. Aurora is a pretty flexible
Hey,
I am not an expert on MPI jobs, but it seems possible to run them on
Aurora. Aurora is a pretty flexible scheduler that lets you run arbitrary
binaries or container images. Aurora is designed for long running services
and assuming that you want to launch workers that are long running, it