Hi Jillian,

You may still consider Aurora if you want a more complex (ala Heron-style)
orchestration around your batch processing workloads.

That said, there are plenty of alternatives for batch processing if you
feel that'll be too much to load:
http://mesos.apache.org/documentation/latest/frameworks/

There is also a young but promising framework specifically targeting large
batch job counts that you may want to explore:
https://github.com/twosigma/Cook.

On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin <
[email protected]> wrote:

> Thanks Brian and Rick - that's what I was starting to think too.  I
> appreciate your input and the quick responses.
>
> Best,
> J.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
> _____________________________
> From: [email protected]
> Sent: Wednesday, May 25, 2016 4:47 AM
> Subject: Re: Would you recommend Aurora?
> To: <[email protected]>
>
>
>
> Sounds to me like you want something like spark or a traditional map
> reduce framework.
>
> On May 24, 2016, at 9:36 PM, Brian Hatfield <[email protected]> wrote:
>
> It seems like Aurora would not be the solution to your problem entirely.
>
> It sounds like you either want a stream processor with a way to stream in
> the chunked batch (see also: Storm or Heron (which runs on Aurora)
> <https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a
> way to process batch jobs (see also: Hadoop, which can run on Mesos
> <https://github.com/mesos/hadoop> and possibly Aurora).
>
> I'm not sure which fits your use case better based upon your description,
> but I hope that this is at least a seed of information in the right
> direction.
>
> Brian
>
> On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin <
> [email protected]> wrote:
>
>> I’m analyzing Aurora as a potential candidate for a new project.  While
>> the high-level architecture seems to be a good fit, I’m not seeing a lot of
>> documentation that matches our use case.
>>
>>  On an ongoing basis, we’ll receive batch files of records (~5 million
>> records per batch), and based on record types we need to “process” them
>> against our services.  We’d break up the records into small chunks,
>> instantiate a job for each chunk, and have each job be automatically queued
>> up to run on available resources (which can be auto scaled up/down as
>> needed).
>>
>>
>>
>> At first glance it looked like Aurora could create jobs  - but I can’t
>> tell whether those can be made as templates so that they can be dynamically
>> instantiated, passed data, and run simultaneously.  Are there any best
>> practices or code examples for this?  Most of what I’ve found fits better
>> with the use case of having different static jobs (like chron jobs or IT
>> services) that each need to be run on a periodic basis or continue running
>> indefinitely.
>>
>>
>>
>> Can anyone let me know whether this is worth pursuing with Aurora?
>>
>>
>>
>> Thanks!
>>
>> J.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>

Reply via email to