What you are looking for is probably a workflow manager. It is more or
less independent from a cluster management system, such as Mesos.

Here is a suggestion for a tool shopping list:

https://github.com/spotify/luigi
https://azkaban.github.io/
https://github.com/airbnb/airflow
https://github.com/pinterest/pinball
https://github.com/sailthru/stolos

Luigi is probably least risk - easy to get started and battle-tested.
I am biased, though.

In batch processing environments, the workflow managers typically run
on a small cluster of "edge nodes", which in turn schedule jobs on
Hadoop or Spark. One could conceive scheduling jobs from edge nodes
both onto Hadoop/Spark and Mesos - the latter would be appropriate for
jobs that fit in a single machine. Hadoop or Spark are often used also
for simpler jobs, at a high cost in hardware and complexity. I have
not heard of any such hybrid integrations, however.

If you go down that path, you may want to look at Aurora for Mesos
scheduling and resource allocation. Unlike Marathon and Kubernetes, it
supports batch jobs. You can build a batch worker farm on Mesos with
e.g. Marathon + RabbitMQ, but you would likely reinvent what Aurora
does.

I answered a related question on the Spark mailing list, which may
provide some useful additional information:
https://www.mail-archive.com/[email protected]/msg34417.html

Regards,

Lars Albertsson




On Wed, Oct 7, 2015 at 9:56 AM, Brian Candler <[email protected]> wrote:
> Are there any open-source job queue/batch systems which run under Mesos? I
> am thinking of things like HTCondor, Torque etc.
>
> The requirement is to be able to:
> - define an overall job as a set of sub-tasks (could be many thousands)
> - put sub-tasks into a queue; execute tasks from the queue
> - dependencies: don't add a sub-task into the queue until its precursors
> have completed successfully
> - restart: after an error, be able to restart the job but skipping those
> sub-tasks which completed successfully
> - preferably handle short-lived tasks efficiently (of order of 10 seconds
> duration)
>
> Clearly it's possible to write a framework to do this, but I don't want to
> re-invent the wheel if it has been done already.
>
> Thanks,
>
> Brian.
>
> P.S. I found Chronos, but it doesn't seem a good match. As far as I can see,
> it's intended for applications where you pre-define a bunch of tasks (via
> GUI? via REST?) and then trigger them periodically.

Reply via email to