On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
Maybe you need to read a bit :)
I have read plenty, including those you list, and I didn't find anything
which met my requirements. Again I apologise if I was not clear in my
question.
Spark has a very specific data model (RDDs) and applications which write
to its API. I want to run arbitrary compute jobs - think "shell scripts"
or "docker containers" which run pre-existing applications which I can't
change. And I want to fill a queue or pipeline with those jobs.
Hadoop also is for specific workloads, written to run under Hadoop and
preferably using HDFS.
The nearest Hadoop gets to general-purpose computing, as far as I can
see, is its YARN scheduler. YARN can in turn run under Mesos. Therefore
a job queue which can run on YARN might be acceptable, although I'd
rather not have an additional layer in the stack. (There was an old
project for running Torque under YARN, but this has been abandoned)
Regards,
Brian.