Are there any open-source job queue/batch systems which run under Mesos?
I am thinking of things like HTCondor, Torque etc.
The requirement is to be able to:
- define an overall job as a set of sub-tasks (could be many thousands)
- put sub-tasks into a queue; execute tasks from the queue
- dependencies: don't add a sub-task into the queue until its precursors
have completed successfully
- restart: after an error, be able to restart the job but skipping those
sub-tasks which completed successfully
- preferably handle short-lived tasks efficiently (of order of 10
seconds duration)
Clearly it's possible to write a framework to do this, but I don't want
to re-invent the wheel if it has been done already.
Thanks,
Brian.
P.S. I found Chronos, but it doesn't seem a good match. As far as I can
see, it's intended for applications where you pre-define a bunch of
tasks (via GUI? via REST?) and then trigger them periodically.