Re: Batch/queue frameworks?

2015-10-08 Thread Lars Albertsson
What you are looking for is probably a workflow manager. It is more or
less independent from a cluster management system, such as Mesos.

Here is a suggestion for a tool shopping list:

https://github.com/spotify/luigi
https://azkaban.github.io/
https://github.com/airbnb/airflow
https://github.com/pinterest/pinball
https://github.com/sailthru/stolos

Luigi is probably least risk - easy to get started and battle-tested.
I am biased, though.

In batch processing environments, the workflow managers typically run
on a small cluster of "edge nodes", which in turn schedule jobs on
Hadoop or Spark. One could conceive scheduling jobs from edge nodes
both onto Hadoop/Spark and Mesos - the latter would be appropriate for
jobs that fit in a single machine. Hadoop or Spark are often used also
for simpler jobs, at a high cost in hardware and complexity. I have
not heard of any such hybrid integrations, however.

If you go down that path, you may want to look at Aurora for Mesos
scheduling and resource allocation. Unlike Marathon and Kubernetes, it
supports batch jobs. You can build a batch worker farm on Mesos with
e.g. Marathon + RabbitMQ, but you would likely reinvent what Aurora
does.

I answered a related question on the Spark mailing list, which may
provide some useful additional information:
https://www.mail-archive.com/user@spark.apache.org/msg34417.html

Regards,

Lars Albertsson




On Wed, Oct 7, 2015 at 9:56 AM, Brian Candler  wrote:
> Are there any open-source job queue/batch systems which run under Mesos? I
> am thinking of things like HTCondor, Torque etc.
>
> The requirement is to be able to:
> - define an overall job as a set of sub-tasks (could be many thousands)
> - put sub-tasks into a queue; execute tasks from the queue
> - dependencies: don't add a sub-task into the queue until its precursors
> have completed successfully
> - restart: after an error, be able to restart the job but skipping those
> sub-tasks which completed successfully
> - preferably handle short-lived tasks efficiently (of order of 10 seconds
> duration)
>
> Clearly it's possible to write a framework to do this, but I don't want to
> re-invent the wheel if it has been done already.
>
> Thanks,
>
> Brian.
>
> P.S. I found Chronos, but it doesn't seem a good match. As far as I can see,
> it's intended for applications where you pre-define a bunch of tasks (via
> GUI? via REST?) and then trigger them periodically.


Re: Batch/queue frameworks?

2015-10-07 Thread James DeFelice
The OP might also be interested in Stolos:
https://github.com/sailthru/stolos

combined with Relay: https://github.com/sailthru/relay


On Wed, Oct 7, 2015 at 8:15 AM, Clarke, Trevor  wrote:

> I'm currently working on this sort of framework. Unfortunately, source is
> not currently available but there is a plan to open source in the next
> couple of months. I'm not sure if your need is immediate or if it can wait
> for a bit. The framework handles jobs in docker containers with pre and
> post steps (copy data into the node, products out, etc.) Individual jobs
> can be strung together in a DAG for complex processing. Directories can be
> watched for new data and jobs can be started in response to this data.
>
> 
> From: Brian Candler [b.cand...@pobox.com]
> Sent: Wednesday, October 07, 2015 3:56 AM
> To: user@mesos.apache.org
> Subject: Batch/queue frameworks?
>
> Are there any open-source job queue/batch systems which run under Mesos?
> I am thinking of things like HTCondor, Torque etc.
>
> The requirement is to be able to:
> - define an overall job as a set of sub-tasks (could be many thousands)
> - put sub-tasks into a queue; execute tasks from the queue
> - dependencies: don't add a sub-task into the queue until its precursors
> have completed successfully
> - restart: after an error, be able to restart the job but skipping those
> sub-tasks which completed successfully
> - preferably handle short-lived tasks efficiently (of order of 10
> seconds duration)
>
> Clearly it's possible to write a framework to do this, but I don't want
> to re-invent the wheel if it has been done already.
>
> Thanks,
>
> Brian.
>
> P.S. I found Chronos, but it doesn't seem a good match. As far as I can
> see, it's intended for applications where you pre-define a bunch of
> tasks (via GUI? via REST?) and then trigger them periodically.
>
>
>
> This message and any enclosures are intended only for the addressee.
> Please
> notify the sender by email if you are not the intended recipient.  If you
> are
> not the intended recipient, you may not use, copy, disclose, or distribute
> this
> message or its contents or enclosures to any other person and any such
> actions
> may be unlawful.  Ball reserves the right to monitor and review all
> messages
> and enclosures sent to or from this email address.
>



-- 
James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)


RE: Batch/queue frameworks?

2015-10-07 Thread Clarke, Trevor
I'm currently working on this sort of framework. Unfortunately, source is not 
currently available but there is a plan to open source in the next couple of 
months. I'm not sure if your need is immediate or if it can wait for a bit. The 
framework handles jobs in docker containers with pre and post steps (copy data 
into the node, products out, etc.) Individual jobs can be strung together in a 
DAG for complex processing. Directories can be watched for new data and jobs 
can be started in response to this data.


From: Brian Candler [b.cand...@pobox.com]
Sent: Wednesday, October 07, 2015 3:56 AM
To: user@mesos.apache.org
Subject: Batch/queue frameworks?

Are there any open-source job queue/batch systems which run under Mesos?
I am thinking of things like HTCondor, Torque etc.

The requirement is to be able to:
- define an overall job as a set of sub-tasks (could be many thousands)
- put sub-tasks into a queue; execute tasks from the queue
- dependencies: don't add a sub-task into the queue until its precursors
have completed successfully
- restart: after an error, be able to restart the job but skipping those
sub-tasks which completed successfully
- preferably handle short-lived tasks efficiently (of order of 10
seconds duration)

Clearly it's possible to write a framework to do this, but I don't want
to re-invent the wheel if it has been done already.

Thanks,

Brian.

P.S. I found Chronos, but it doesn't seem a good match. As far as I can
see, it's intended for applications where you pre-define a bunch of
tasks (via GUI? via REST?) and then trigger them periodically.



This message and any enclosures are intended only for the addressee.  Please 
notify the sender by email if you are not the intended recipient.  If you are 
not the intended recipient, you may not use, copy, disclose, or distribute this 
message or its contents or enclosures to any other person and any such actions 
may be unlawful.  Ball reserves the right to monitor and review all messages 
and enclosures sent to or from this email address.


Re: Batch/queue frameworks?

2015-10-07 Thread Pablo Cingolani
I answer below...


On Wed, Oct 7, 2015 at 8:17 AM, Brian Candler  wrote:

> On 07/10/2015 11:08, Pablo Cingolani wrote:
>
> It looks like you are looking for something like BDS
>
>   http://pcingola.github.io/BigDataScript/
>
> It has the additional advantage that you can port your scripts seamlessly
> between Mesos and other cluster systems (SGE, PBS, Torque, etc.).
>
> Yes, that looks very interesting, thank you!  It seems to perform the same
> role as HTCondor Dagman, but with pluggable backends and a much more
> expressive language.
>
> At http://pcingola.github.io/BigDataScript/bigDataScript_manual.html
> under "Resource consumption and task options", I don't see any option for
> declaring the memory used by a task. Is that a wishlist feature?
>

You can use "mem=NNN" to specify memory requirements.


> In fact, mesos allows arbitrary resources, so it would be good to be able
> to specify resource requirements of
> any particular resource.
>

Arbitrary resources are not supported yet.



>
> I note that BDS allows a task to specify it runs on one particular cluster
> node. In my application it would also be helpful to be able to specify a
> particular class of node. (When submitting a job to HTCondor this could be
> expanded to a requirements expression)
>

Typically this is done using "queue" type in other clusters.
At the moment (for Mesos systems) this parameter is mostly
ignored, but I can add support it if you need it.
Yours

Pablo



>
> Regards,
>
> Brian.
>
>


Re: Batch/queue frameworks?

2015-10-07 Thread Brian Candler

On 07/10/2015 11:08, Pablo Cingolani wrote:

It looks like you are looking for something like BDS

http://pcingola.github.io/BigDataScript/

It has the additional advantage that you can port your scripts seamlessly
between Mesos and other cluster systems (SGE, PBS, Torque, etc.).

Yes, that looks very interesting, thank you!  It seems to perform the 
same role as HTCondor Dagman, but with pluggable backends and a much 
more expressive language.


At http://pcingola.github.io/BigDataScript/bigDataScript_manual.html
under "Resource consumption and task options", I don't see any option 
for declaring the memory used by a task. Is that a wishlist feature? In 
fact, mesos allows arbitrary resources, so it would be good to be able 
to specify resource requirements of

any particular resource.

I note that BDS allows a task to specify it runs on one particular 
cluster node. In my application it would also be helpful to be able to 
specify a particular class of node. (When submitting a job to HTCondor 
this could be expanded to a requirements expression)


Regards,

Brian.



Re: Batch/queue frameworks?

2015-10-07 Thread Nikolaos Ballas neXus
I think any pub/sub system(name it typical jms / rabbitmq/ kafka) etc would do 
what you describe. All of them can be run as containers inside apache mess 
cluster. Kafka has really good integration with MEsos and YARN and also is more 
lightweight than a typical jus implementation.

regards
\n\m
On 07 Oct 2015, at 12:05, F21 
mailto:f21.gro...@gmail.com>> wrote:

I am also interested in something like this, although my requirements are much 
more simpler.

I am interested in a work queue like beanstalkd that will allow me to push to a 
queue from a web app and have workers to do things like send emails, generate 
pdfs and resize images.

I have thought about running a beanstalkd in a container, but it has some 
limitations. For example, if it crashes, it needs to be relaunched manually to 
recover the binlog (which is a no go).

Another option I can think of is to use kafka (which has a mesos framework) and 
have the web app and other parts push jobs into the kafka broker. Workers 
listening on the broker would pop each job off and execute whatever needs to be 
done.

However, there seems to be a lot of wheel-reinventing what that solution. For 
example, what if a job depends on another job? There's also a lot of work that 
needs to be done at a lower level when all I am interested in is to write 
domain specific code to generate the pdf, resize the image etc.

If there's a work queue solution for mesos, I would love to know too.



On 7/10/2015 8:08 PM, Brian Candler wrote:
On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
Maybe you need to read a bit  :)
I have read plenty, including those you list, and I didn't find anything which 
met my requirements. Again I apologise if I was not clear in my question.

Spark has a very specific data model (RDDs) and applications which write to its 
API. I want to run arbitrary compute jobs - think "shell scripts" or "docker 
containers" which run pre-existing applications which I can't change.  And I 
want to fill a queue or pipeline with those jobs.

Hadoop also is for specific workloads, written to run under Hadoop and 
preferably using HDFS.

The nearest Hadoop gets to general-purpose computing, as far as I can see, is 
its YARN scheduler. YARN can in turn run under Mesos. Therefore a job queue 
which can run on YARN might be acceptable, although I'd rather not have an 
additional layer in the stack. (There was an old project for running Torque 
under YARN, but this has been abandoned)

Regards,

Brian.



Nikolaos Ballas  |  Software Development Manager

Technology Nexus S.a.r.l.
2-4 Rue Eugene Rupert
2453 Luxembourg
Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building
Tel: + 3522619113580
cont...@nexusgroup.com | 
nexusgroup.com
LinkedIn.com | 
Twitter | 
Facebook.com



[cid:87987ACD-6CF7-41BE-9517-E612DBF86ABA@pwcacc.com]
\



Re: Batch/queue frameworks?

2015-10-07 Thread David Greenberg
Another great option is Cook: https://github.com/twosigma/Cook

Cook combines a simple REST API for batch jobs with sophisticated
fair-sharing and preemption features on Mesos. Tomorrow, at MesosCon
Europe, I'll be speaking about it in more detail. When we want to use
dependencies with Cook, we use a workflow tool that creates the dependent
jobs on-the-fly.

On Wed, Oct 7, 2015 at 11:08 AM Pablo Cingolani 
wrote:

>
> It looks like you are looking for something like BDS
>
>   http://pcingola.github.io/BigDataScript/
>
> It has the additional advantage that you can port your scripts seamlessly
> between Mesos and other cluster systems (SGE, PBS, Torque, etc.).
>
>
>
>
>
> On Wed, Oct 7, 2015 at 7:05 AM, F21  wrote:
>
>> I am also interested in something like this, although my requirements are
>> much more simpler.
>>
>> I am interested in a work queue like beanstalkd that will allow me to
>> push to a queue from a web app and have workers to do things like send
>> emails, generate pdfs and resize images.
>>
>> I have thought about running a beanstalkd in a container, but it has some
>> limitations. For example, if it crashes, it needs to be relaunched manually
>> to recover the binlog (which is a no go).
>>
>> Another option I can think of is to use kafka (which has a mesos
>> framework) and have the web app and other parts push jobs into the kafka
>> broker. Workers listening on the broker would pop each job off and execute
>> whatever needs to be done.
>>
>> However, there seems to be a lot of wheel-reinventing what that solution.
>> For example, what if a job depends on another job? There's also a lot of
>> work that needs to be done at a lower level when all I am interested in is
>> to write domain specific code to generate the pdf, resize the image etc.
>>
>> If there's a work queue solution for mesos, I would love to know too.
>>
>>
>>
>>
>> On 7/10/2015 8:08 PM, Brian Candler wrote:
>>
>> On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
>>
>> Maybe you need to read a bit  :)
>>
>> I have read plenty, including those you list, and I didn't find anything
>> which met my requirements. Again I apologise if I was not clear in my
>> question.
>>
>> Spark has a very specific data model (RDDs) and applications which write
>> to its API. I want to run arbitrary compute jobs - think "shell scripts" or
>> "docker containers" which run pre-existing applications which I can't
>> change.  And I want to fill a queue or pipeline with those jobs.
>>
>> Hadoop also is for specific workloads, written to run under Hadoop and
>> preferably using HDFS.
>>
>> The nearest Hadoop gets to general-purpose computing, as far as I can
>> see, is its YARN scheduler. YARN can in turn run under Mesos. Therefore a
>> job queue which can run on YARN might be acceptable, although I'd rather
>> not have an additional layer in the stack. (There was an old project for
>> running Torque under YARN, but this has been abandoned)
>>
>> Regards,
>>
>> Brian.
>>
>>
>>
>


Re: Batch/queue frameworks?

2015-10-07 Thread Pablo Cingolani
It looks like you are looking for something like BDS

  http://pcingola.github.io/BigDataScript/

It has the additional advantage that you can port your scripts seamlessly
between Mesos and other cluster systems (SGE, PBS, Torque, etc.).





On Wed, Oct 7, 2015 at 7:05 AM, F21  wrote:

> I am also interested in something like this, although my requirements are
> much more simpler.
>
> I am interested in a work queue like beanstalkd that will allow me to push
> to a queue from a web app and have workers to do things like send emails,
> generate pdfs and resize images.
>
> I have thought about running a beanstalkd in a container, but it has some
> limitations. For example, if it crashes, it needs to be relaunched manually
> to recover the binlog (which is a no go).
>
> Another option I can think of is to use kafka (which has a mesos
> framework) and have the web app and other parts push jobs into the kafka
> broker. Workers listening on the broker would pop each job off and execute
> whatever needs to be done.
>
> However, there seems to be a lot of wheel-reinventing what that solution.
> For example, what if a job depends on another job? There's also a lot of
> work that needs to be done at a lower level when all I am interested in is
> to write domain specific code to generate the pdf, resize the image etc.
>
> If there's a work queue solution for mesos, I would love to know too.
>
>
>
>
> On 7/10/2015 8:08 PM, Brian Candler wrote:
>
> On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:
>
> Maybe you need to read a bit  :)
>
> I have read plenty, including those you list, and I didn't find anything
> which met my requirements. Again I apologise if I was not clear in my
> question.
>
> Spark has a very specific data model (RDDs) and applications which write
> to its API. I want to run arbitrary compute jobs - think "shell scripts" or
> "docker containers" which run pre-existing applications which I can't
> change.  And I want to fill a queue or pipeline with those jobs.
>
> Hadoop also is for specific workloads, written to run under Hadoop and
> preferably using HDFS.
>
> The nearest Hadoop gets to general-purpose computing, as far as I can see,
> is its YARN scheduler. YARN can in turn run under Mesos. Therefore a job
> queue which can run on YARN might be acceptable, although I'd rather not
> have an additional layer in the stack. (There was an old project for
> running Torque under YARN, but this has been abandoned)
>
> Regards,
>
> Brian.
>
>
>


Re: Batch/queue frameworks?

2015-10-07 Thread F21
I am also interested in something like this, although my requirements 
are much more simpler.


I am interested in a work queue like beanstalkd that will allow me to 
push to a queue from a web app and have workers to do things like send 
emails, generate pdfs and resize images.


I have thought about running a beanstalkd in a container, but it has 
some limitations. For example, if it crashes, it needs to be relaunched 
manually to recover the binlog (which is a no go).


Another option I can think of is to use kafka (which has a mesos 
framework) and have the web app and other parts push jobs into the kafka 
broker. Workers listening on the broker would pop each job off and 
execute whatever needs to be done.


However, there seems to be a lot of wheel-reinventing what that 
solution. For example, what if a job depends on another job? There's 
also a lot of work that needs to be done at a lower level when all I am 
interested in is to write domain specific code to generate the pdf, 
resize the image etc.


If there's a work queue solution for mesos, I would love to know too.



On 7/10/2015 8:08 PM, Brian Candler wrote:

On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:

Maybe you need to read a bit  :)
I have read plenty, including those you list, and I didn't find 
anything which met my requirements. Again I apologise if I was not 
clear in my question.


Spark has a very specific data model (RDDs) and applications which 
write to its API. I want to run arbitrary compute jobs - think "shell 
scripts" or "docker containers" which run pre-existing applications 
which I can't change.  And I want to fill a queue or pipeline with 
those jobs.


Hadoop also is for specific workloads, written to run under Hadoop and 
preferably using HDFS.


The nearest Hadoop gets to general-purpose computing, as far as I can 
see, is its YARN scheduler. YARN can in turn run under Mesos. 
Therefore a job queue which can run on YARN might be acceptable, 
although I'd rather not have an additional layer in the stack. (There 
was an old project for running Torque under YARN, but this has been 
abandoned)


Regards,

Brian.





Re: Batch/queue frameworks?

2015-10-07 Thread Brian Candler

On 07/10/2015 09:44, Nikolaos Ballas neXus wrote:

Maybe you need to read a bit  :)
I have read plenty, including those you list, and I didn't find anything 
which met my requirements. Again I apologise if I was not clear in my 
question.


Spark has a very specific data model (RDDs) and applications which write 
to its API. I want to run arbitrary compute jobs - think "shell scripts" 
or "docker containers" which run pre-existing applications which I can't 
change.  And I want to fill a queue or pipeline with those jobs.


Hadoop also is for specific workloads, written to run under Hadoop and 
preferably using HDFS.


The nearest Hadoop gets to general-purpose computing, as far as I can 
see, is its YARN scheduler. YARN can in turn run under Mesos. Therefore 
a job queue which can run on YARN might be acceptable, although I'd 
rather not have an additional layer in the stack. (There was an old 
project for running Torque under YARN, but this has been abandoned)


Regards,

Brian.



Re: Batch/queue frameworks?

2015-10-07 Thread Nikolaos Ballas neXus
Maybe you need to read a bit  :) Hadoop/Spark are batch processing frameworks, 
both can run on top of Mesos. If you want to do online processing the you have 
the Apache Storm child. On the other hand super computer != distributed 
computing. You referred to croons and I thought you were asking for a 
scheduler. You need to read maybe a bit to understand the technology stack, 
cause the answers to your question are rather obvious for a guy with ds 
background, even basic, following the market. Jobs can be either executed with 
Hadoop executors or delegate jobs to processes configured in docker containers 
that mess can bootstrap.

kind regards
\n\m

On 07 Oct 2015, at 10:37, Brian Candler 
mailto:b.cand...@pobox.com>> wrote:

On 07/10/2015 09:01, Nikolaos Ballas neXus wrote:
Check for Marathon

I don't see how Marathon does what I want. Maybe I wasn't clear enough in 
explaining my requirements.

What I need is basically a supercomputer cluster where I can take a large 
computation job, break it into lots of sub-tasks, and run as many of those 
sub-tasks in parallel as possible given the CPU resources available, until all 
the sub-tasks are done.

The core of any sort of system like that is a "job queue" where all the 
sub-tasks are entered. The executor picks out another task whenever there is 
some free resource available, and when it finishes, it is removed from the 
queue.

I don't see how Marathon has such a job queue. As far as I can tell, Marathon 
is for starting long-lived applications; you define what things you want 
running, it starts them, and restarts them if they die for any reason.

Or have I misunderstood what Marathon is capable of? If so, can you point me at 
the relevant documentation?

The advantage of running such a supercomputer cluster under Mesos would be that 
I could run *other* applications (including those started by Marathon or 
Chronos) on the same hardware.

Thanks,

Brian.


Nikolaos Ballas  |  Software Development Manager

Technology Nexus S.a.r.l.
2-4 Rue Eugene Rupert
2453 Luxembourg
Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building
Tel: + 3522619113580
cont...@nexusgroup.com | 
nexusgroup.com
LinkedIn.com | 
Twitter | 
Facebook.com



[cid:87987ACD-6CF7-41BE-9517-E612DBF86ABA@pwcacc.com]
\



Re: Batch/queue frameworks?

2015-10-07 Thread Brian Candler

On 07/10/2015 09:01, Nikolaos Ballas neXus wrote:

Check for Marathon


I don't see how Marathon does what I want. Maybe I wasn't clear enough 
in explaining my requirements.


What I need is basically a supercomputer cluster where I can take a 
large computation job, break it into lots of sub-tasks, and run as many 
of those sub-tasks in parallel as possible given the CPU resources 
available, until all the sub-tasks are done.


The core of any sort of system like that is a "job queue" where all the 
sub-tasks are entered. The executor picks out another task whenever 
there is some free resource available, and when it finishes, it is 
removed from the queue.


I don't see how Marathon has such a job queue. As far as I can tell, 
Marathon is for starting long-lived applications; you define what things 
you want running, it starts them, and restarts them if they die for any 
reason.


Or have I misunderstood what Marathon is capable of? If so, can you 
point me at the relevant documentation?


The advantage of running such a supercomputer cluster under Mesos would 
be that I could run *other* applications (including those started by 
Marathon or Chronos) on the same hardware.


Thanks,

Brian.



Re: Batch/queue frameworks?

2015-10-07 Thread Nikolaos Ballas neXus
Check for Marathon
On 07 Oct 2015, at 09:56, Brian Candler 
mailto:b.cand...@pobox.com>> wrote:

Are there any open-source job queue/batch systems which run under Mesos? I am 
thinking of things like HTCondor, Torque etc.

The requirement is to be able to:
- define an overall job as a set of sub-tasks (could be many thousands)
- put sub-tasks into a queue; execute tasks from the queue
- dependencies: don't add a sub-task into the queue until its precursors have 
completed successfully
- restart: after an error, be able to restart the job but skipping those 
sub-tasks which completed successfully
- preferably handle short-lived tasks efficiently (of order of 10 seconds 
duration)

Clearly it's possible to write a framework to do this, but I don't want to 
re-invent the wheel if it has been done already.

Thanks,

Brian.

P.S. I found Chronos, but it doesn't seem a good match. As far as I can see, 
it's intended for applications where you pre-define a bunch of tasks (via GUI? 
via REST?) and then trigger them periodically.

Nikolaos Ballas  |  Software Development Manager

Technology Nexus S.a.r.l.
2-4 Rue Eugene Rupert
2453 Luxembourg
Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building
Tel: + 3522619113580
cont...@nexusgroup.com | 
nexusgroup.com
LinkedIn.com | 
Twitter | 
Facebook.com



[cid:87987ACD-6CF7-41BE-9517-E612DBF86ABA@pwcacc.com]
\