I evaluated Aurora for DAG workflow orchestration last year (http://goo.gl/ED6LzL), and concluded that it is quite far from having sufficient workflow management features, except for trivial scenarios.
A workflow manager and a resource scheduler are more or less orthogonal components, so I think you would be better off combining it with a separate tool for the orchestration. Although Airflow has nice visualisation, Luigi (https://github.com/spotify/luigi) is easier to get started with, and more mature. (Disclaimer: I am a Luigi contributor.) There is a lot of functionality in the existing workflow tools, both for working with the DAG, and for integrating with other components. I believe that you would get more value for your effort, and make a good contribution to the OSS community if you e.g. create an integration for Airflow or Luigi, enabling them to easily spawn Aurora batch jobs, similarly to the Spark and Hadoop integrations. Let me know if you want assistance. Regards, Lars Albertsson Data engineering consultant www.mapflat.com +46 70 7687109 On Wed, Feb 3, 2016 at 12:11 PM, Krisztian Szucs <[email protected]> wrote: > I’m really not a fan of Airflow. I’d prefer Aurora to handle DAG scheduling > and a thin python wrapper around Aurora’s DSL - which I really like. > I think supporting batch workflows in Aurora is a missing feature. That’s > the only reason we’re hesitating to replace Chronos. > > What would be the basic workflow if we plan to implement this feature to > Aurora? > > On 03 Feb 2016, at 00:06, Erb, Stephan <[email protected]> wrote: > > FWIW, the guys from oscar health have build this one: > http://dna.hioscar.com/2015/12/09/running-job-pipelines-in-aurora/ > Unfortunately, it does not seem to be open source. At least, I cannot find > it on their github page https://github.com/oscarhealth. > > > In addition, have you thought about keeping the dependency management > outside of Aurora in a different tool and use Aurora just for the execution? > For example, you could use Airflow (https://github.com/airbnb/airflow) to do > the entire dependency management, time tracking etc. But when it's up to > doing some actual work you use an AuroraOperator (tbd :-) in Airflow that > schedules your job on Aurora. Writing a custom operator is not that hard > (https://pythonhosted.org/airflow/code.html?highlight=operator#basesensoroperator). > > I guess this would give you the best of both worlds. If you are fancy, you > also use Aurora to spawn airflow itself. > > Regards, > Stephan > > > ________________________________ > From: Krisztian Szucs <[email protected]> > Sent: Tuesday, February 2, 2016 10:22 PM > To: [email protected] > Subject: Re: Explicit job execution order > > > > On 02 Feb 2016, at 22:01, Bill Farner <[email protected]> wrote: > > My mistake, i skimmed past Chronos and was thinking services rather than > batch. I think this is a legitimate use case, but nobody has seemed to yet > have the requirement + commitment to add the feature. I will happily guide > anyone willing to put forth effort! > > > We have both of it, especially if You provide a quick solution to define > primitive Job dependencies in order to start migration workflows from > chronos. > During migration we’ll dig into the details. > > > On Tue, Feb 2, 2016 at 12:58 PM, Krisztian Szucs <[email protected]> > wrote: >> >> We need to implement hybrid workflows, including batch processing (Spark). >> Many of the jobs run unique docker images with very different dependencies >> and resources so we can’t use the Process level ordering instead of Job >> ordering. >> >> I’ve seen the resolution of >> https://issues.apache.org/jira/browse/AURORA-735 is Later :) >> >> On 02 Feb 2016, at 21:44, Bill Farner <[email protected]> wrote: >> >> In general, i've assumed that job dependencies create more problems than >> they solve (e.g. scheduling behavior when a parent job is removed, >> parent/child relationships that span auth groups, etc). Dependencies seem >> handy for setting up and tearing down groups of jobs for things like >> development environments, but that should be easily replaceable by a small >> script. Is this contrary to your experience? >> >> >> Through API calls? >> >> >> On Tue, Feb 2, 2016 at 12:34 PM, Krisztian Szucs >> <[email protected]> wrote: >>> >>> Hi Everyone! >>> >>> We’d like to migrate our jobs from Chronos to Aurora. >>> AFAIK Aurora doesn’t support dependant jobs. >>> Could You recommend any tools or a workaround to specify e.g. parent >>> jobs? >>> >>> - Krisztian >>> >> >> > > > >
