I mentioned Heron yesterday in this thread - you might like to know that as of this morning, it's now open source: https://blog.twitter.com/2016/open-sourcing-twitter-heron
On Wed, May 25, 2016 at 12:22 PM, Maxim Khutornenko <[email protected]> wrote: > Hi Jillian, > > You may still consider Aurora if you want a more complex (ala Heron-style) > orchestration around your batch processing workloads. > > That said, there are plenty of alternatives for batch processing if you > feel that'll be too much to load: > http://mesos.apache.org/documentation/latest/frameworks/ > > There is also a young but promising framework specifically targeting large > batch job counts that you may want to explore: > https://github.com/twosigma/Cook. > > On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin < > [email protected]> wrote: > >> Thanks Brian and Rick - that's what I was starting to think too. I >> appreciate your input and the quick responses. >> >> Best, >> J. >> >> Get Outlook for iOS <https://aka.ms/o0ukef> >> >> _____________________________ >> From: [email protected] >> Sent: Wednesday, May 25, 2016 4:47 AM >> Subject: Re: Would you recommend Aurora? >> To: <[email protected]> >> >> >> >> Sounds to me like you want something like spark or a traditional map >> reduce framework. >> >> On May 24, 2016, at 9:36 PM, Brian Hatfield <[email protected]> >> wrote: >> >> It seems like Aurora would not be the solution to your problem entirely. >> >> It sounds like you either want a stream processor with a way to stream in >> the chunked batch (see also: Storm or Heron (which runs on Aurora) >> <https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a >> way to process batch jobs (see also: Hadoop, which can run on Mesos >> <https://github.com/mesos/hadoop> and possibly Aurora). >> >> I'm not sure which fits your use case better based upon your description, >> but I hope that this is at least a seed of information in the right >> direction. >> >> Brian >> >> On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin < >> [email protected]> wrote: >> >>> I’m analyzing Aurora as a potential candidate for a new project. While >>> the high-level architecture seems to be a good fit, I’m not seeing a lot of >>> documentation that matches our use case. >>> >>> On an ongoing basis, we’ll receive batch files of records (~5 million >>> records per batch), and based on record types we need to “process” them >>> against our services. We’d break up the records into small chunks, >>> instantiate a job for each chunk, and have each job be automatically queued >>> up to run on available resources (which can be auto scaled up/down as >>> needed). >>> >>> >>> >>> At first glance it looked like Aurora could create jobs - but I can’t >>> tell whether those can be made as templates so that they can be dynamically >>> instantiated, passed data, and run simultaneously. Are there any best >>> practices or code examples for this? Most of what I’ve found fits better >>> with the use case of having different static jobs (like chron jobs or IT >>> services) that each need to be run on a periodic basis or continue running >>> indefinitely. >>> >>> >>> >>> Can anyone let me know whether this is worth pursuing with Aurora? >>> >>> >>> >>> Thanks! >>> >>> J. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> >
