Re: Would you recommend Aurora?

Erb, Stephan Sun, 12 Jun 2016 10:51:06 -0700

Could you clarify your cron usecase? Millions of cron jobs that run up to every 
minute sounds more like you want a couple of long running processes that do the 
actual work with a little sleep in between, rather than doing task spawning and 
distribution in Mesos & Aurora for each of them.



Regarding Aurora's scale: Twitter has recently disclosed that they have 250,000 
containers/tasks running, with the largest cluster being in the range of 30,000 
nodes [1].  Aurora is by default not trying to schedule more than 40 tasks per 
second [2]. You can probably try to adjust that value, but this could bring 
other downsides.



[1] https://youtu.be/FU7wrqsRj3o?t=21m11s

[2] 
https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java#L39-L41

________________________________
From: Ziliang Chen <[email protected]>
Sent: Saturday, June 11, 2016 17:15
To: [email protected]
Subject: Re: Would you recommend Aurora?

Hi,

Great discussion here.
May I extend the question a little bit ? I am wondering how Aurora scales: can 
Aurora schedule millions of cron (for cron, the jobs run periodically say every 
1, 2 or 5 minutes) /service jobs ? Is there any documentation/perf benchmark 
for Aurora i can refer to ? I heard that Aurora can schedule several thousands 
jobs per second. Never tested that, but good to confirm.

Thanks a lot !

On Thu, May 26, 2016 at 1:01 AM, Jillian Cocklin 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Brian & Maxim, those are great leads.  Awesome that Heron has gone open 
source!  Definitely glad to have learned more about Aurora – for the right 
situation it seems like a really great solution.

Thanks,
J.

From: Brian Hatfield 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, May 25, 2016 9:57 AM
To: [email protected]<mailto:[email protected]>

Subject: Re: Would you recommend Aurora?

I mentioned Heron yesterday in this thread - you might like to know that as of 
this morning, it's now open source: 
https://blog.twitter.com/2016/open-sourcing-twitter-heron

On Wed, May 25, 2016 at 12:22 PM, Maxim Khutornenko 
<[email protected]<mailto:[email protected]>> wrote:
Hi Jillian,

You may still consider Aurora if you want a more complex (ala Heron-style) 
orchestration around your batch processing workloads.

That said, there are plenty of alternatives for batch processing if you feel 
that'll be too much to load: 
http://mesos.apache.org/documentation/latest/frameworks/

There is also a young but promising framework specifically targeting large 
batch job counts that you may want to explore: https://github.com/twosigma/Cook.

On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Brian and Rick - that's what I was starting to think too.  I appreciate 
your input and the quick responses.

Best,
J.
Get Outlook for iOS<https://aka.ms/o0ukef>

_____________________________
From: [email protected]<mailto:[email protected]>
Sent: Wednesday, May 25, 2016 4:47 AM
Subject: Re: Would you recommend Aurora?
To: <[email protected]<mailto:[email protected]>>


Sounds to me like you want something like spark or a traditional map reduce 
framework.

On May 24, 2016, at 9:36 PM, Brian Hatfield 
<[email protected]<mailto:[email protected]>> wrote:
It seems like Aurora would not be the solution to your problem entirely.

It sounds like you either want a stream processor with a way to stream in the 
chunked batch (see also: Storm or Heron (which runs on 
Aurora)<https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a 
way to process batch jobs (see also: Hadoop, which can run on 
Mesos<https://github.com/mesos/hadoop> and possibly Aurora).

I'm not sure which fits your use case better based upon your description, but I 
hope that this is at least a seed of information in the right direction.

Brian

On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin 
<[email protected]<mailto:[email protected]>> wrote:
I’m analyzing Aurora as a potential candidate for a new project.  While the 
high-level architecture seems to be a good fit, I’m not seeing a lot of 
documentation that matches our use case.
 On an ongoing basis, we’ll receive batch files of records (~5 million records 
per batch), and based on record types we need to “process” them against our 
services.  We’d break up the records into small chunks, instantiate a job for 
each chunk, and have each job be automatically queued up to run on available 
resources (which can be auto scaled up/down as needed).

At first glance it looked like Aurora could create jobs  - but I can’t tell 
whether those can be made as templates so that they can be dynamically 
instantiated, passed data, and run simultaneously.  Are there any best 
practices or code examples for this?  Most of what I’ve found fits better with 
the use case of having different static jobs (like chron jobs or IT services) 
that each need to be run on a periodic basis or continue running indefinitely.

Can anyone let me know whether this is worth pursuing with Aurora?

Thanks!
J.














--
Regards, Zi-Liang

Mail:[email protected]<mailto:mail%[email protected]>

Re: Would you recommend Aurora?

Reply via email to