Hi Jillian, You may still consider Aurora if you want a more complex (ala Heron-style) orchestration around your batch processing workloads.
That said, there are plenty of alternatives for batch processing if you feel that'll be too much to load: http://mesos.apache.org/documentation/latest/frameworks/ There is also a young but promising framework specifically targeting large batch job counts that you may want to explore: https://github.com/twosigma/Cook. On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin < [email protected]> wrote: > Thanks Brian and Rick - that's what I was starting to think too. I > appreciate your input and the quick responses. > > Best, > J. > > Get Outlook for iOS <https://aka.ms/o0ukef> > > _____________________________ > From: [email protected] > Sent: Wednesday, May 25, 2016 4:47 AM > Subject: Re: Would you recommend Aurora? > To: <[email protected]> > > > > Sounds to me like you want something like spark or a traditional map > reduce framework. > > On May 24, 2016, at 9:36 PM, Brian Hatfield <[email protected]> wrote: > > It seems like Aurora would not be the solution to your problem entirely. > > It sounds like you either want a stream processor with a way to stream in > the chunked batch (see also: Storm or Heron (which runs on Aurora) > <https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a > way to process batch jobs (see also: Hadoop, which can run on Mesos > <https://github.com/mesos/hadoop> and possibly Aurora). > > I'm not sure which fits your use case better based upon your description, > but I hope that this is at least a seed of information in the right > direction. > > Brian > > On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin < > [email protected]> wrote: > >> I’m analyzing Aurora as a potential candidate for a new project. While >> the high-level architecture seems to be a good fit, I’m not seeing a lot of >> documentation that matches our use case. >> >> On an ongoing basis, we’ll receive batch files of records (~5 million >> records per batch), and based on record types we need to “process” them >> against our services. We’d break up the records into small chunks, >> instantiate a job for each chunk, and have each job be automatically queued >> up to run on available resources (which can be auto scaled up/down as >> needed). >> >> >> >> At first glance it looked like Aurora could create jobs - but I can’t >> tell whether those can be made as templates so that they can be dynamically >> instantiated, passed data, and run simultaneously. Are there any best >> practices or code examples for this? Most of what I’ve found fits better >> with the use case of having different static jobs (like chron jobs or IT >> services) that each need to be run on a periodic basis or continue running >> indefinitely. >> >> >> >> Can anyone let me know whether this is worth pursuing with Aurora? >> >> >> >> Thanks! >> >> J. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > >
