Re: Graduation resolution passed - Airflow is a TLP

2018-12-20 Thread Gerard Toonstra
fantastic! congrats to all. great xmas news, enjoy the holidays. Sent from my iPhone > On 20 Dec 2018, at 18:43, Kaxil Naik wrote: > > Congratulations all. > >> On Thu, Dec 20, 2018, 21:41 Driesprong, Fokko > >> Awesome! Congrats! >> >> Cheers, Fokko >> >> Op do 20 dec. 2018 om 22:40

Re: Mocking airflow (similar to moto for AWS)

2018-10-18 Thread Gerard Toonstra
There was a discussion about a unit testing approach last year 2017 I believe. If you dig the mail archives, you can find it. My take is: - You should test "hooks" against some real system, which can be a docker container. Make sure the behavior is predictable when talking against that system.

Re: Modeling rate limited api calls in airflow

2018-08-12 Thread Gerard Toonstra
( > task_id=shift_callable.__name__, > python_callable=adwords_callable, > ) >> TriggerDagRunOperator( > task_id='retry_dag_on_failure', > trigger_dag_id=dag_id, > trigger_rule=TriggerRule.ONE_FAILED, >

Re: Modeling rate limited api calls in airflow

2018-08-09 Thread Gerard Toonstra
Have you looked into pools? Pools allow you to specify how many tasks at any given time should use a common resource. That way you could limit this to 1, 2, or 3 for example. Pools are not dynamic however, so it only allows you to upper limit how many number of clients are going to hit the API at

Re: Single Airflow Instance Vs Multiple Airflow Instance

2018-06-06 Thread Gerard Toonstra
We are using two cluster instances. One cluster is for the engineering teams that are in the "tech" wing and which rigorously follow tech principles, the other instance is for use by business analysts and more ad-hoc, experimental work, who do not necessarily follow the principles. We have a nomad

Processing of files: best practices for airflow

2018-05-11 Thread Gerard Toonstra
Hi all, I have a question regarding the processing of individual files: We collect some flat files from different sources in csv, raw and unstructured formats. These files are stored in a "{process}//MM/DD/" hierarchy and we've built a GCSToGCSTransform operator, which runs a

Re: Managed Apache Airflow Service on Google Cloud Platform

2018-05-01 Thread Gerard Toonstra
Really good stuff. About 1.5 years ago I talked to some googlers on how awesome it would be to integrate the principles of airflow onto GCP and maybe even make it available as some sort of "launcher". Looks like you went beyond that and made it a core product instead! Going to look into this over

Re: 1.10.0beta1 now available for download

2018-04-23 Thread Gerard Toonstra
fantastic effort, much appreciated! go go go On Mon, Apr 23, 2018 at 11:08 AM, Ace Haidrey wrote: > Great work Fokko and Bolke! > > Sent from my iPhone > > > On Apr 23, 2018, at 11:07 AM, Sid Anand wrote: > > > > Awesome! > > -s > > > >> On Mon, Apr 23,

Airflow ETL freelancer wanted in Stockholm (6 months)

2018-04-12 Thread Gerard Toonstra
Someone contacted me, looking specifically for someone with airflow experience. If anyone is interested, you can contact him through the email below: For one of our clients in Stockholm we're currently searching for a contractor to assist with Airflow ETL orchestration. It's an

Data Vault on Hive + AIrflow example

2018-02-28 Thread Gerard Toonstra
Yesterday I finished the draft of a new example on the "ETL with airflow" site. This example explores the concept of a "Data vault" methodology on top of Hive, 100% orchestrated by airflow: https://gtoonstra.github.io/etl-with-airflow/datavault2.html The theory of the data vault is that you can

Re: max_active_runs

2018-02-14 Thread Gerard Toonstra
y, but could it be the location of where max_active_runs > is specified? In our DAGs we pass it directly as an argument to the DAG() > call, not via default_arguments and it behaves itself for us. I think > I should check that! > > -ash > > > > On 14 Feb 2018, at 13:43, Gera

max_active_runs

2018-02-14 Thread Gerard Toonstra
A user on airflow 1.9.0 reports that 'max_active_runs' isn't respected. I remembered having fixed something related to this ages ago and this is here: https://issues.apache.org/jira/browse/AIRFLOW-137 That however was related to backfills and clearing the dagruns. I watched him in the scenario

Re: best way to handle version upgrades of libraries used by tasks

2018-01-30 Thread Gerard Toonstra
As long as the differences are in API methods and not a rearrangement of the package structure the latter option would work. This is because the operators would be imported by the scheduler, just not executed (and therefore perhaps not call the specific operator methods). If you serialize the

Re: Airflow thshirts

2018-01-29 Thread Gerard Toonstra
What we need is an airflow flamethrower On Sat, Jan 27, 2018 at 2:21 AM, Hbw wrote: > Me too! > > Had an Ant hat for years... > > Sent from a device with less than stellar autocorrect > > > On Jan 26, 2018, at 2:43 PM, Trent Robbins

Re: Efficient Way to Deploy Dags On Airflow

2018-01-22 Thread Gerard Toonstra
This was a question I put in a survey I once conducted. The survey is available here (including the individual results at the bottom): https://cwiki.apache.org/confluence/display/AIRFLOW/Apache+Airflow+survey+2017-06-24 1. I recommend not to use different versions as in separate directories, but

Data Vault example with airflow...

2018-01-21 Thread Gerard Toonstra
I've added a new example using a "Data Vault" methodology available here: https://gtoonstra.github.io/etl-with-airflow/datavault.html What I find compelling about DataVault is how it enables you to store data in a flexible way and regenerate some downstream star schema on the fly from scratch

Re: Hiring Airflow devs

2018-01-02 Thread Gerard Toonstra
Hi sid, Is this open to europeans as well, those who dont necessarily have a us visa? Rgds, gerard Sent from my iPhone > On 2 Jan 2018, at 20:13, Sid Anand wrote: > > Hi Folks! > I'm looking for a few folks who want to work on Apache Airflow (@ PayPal > scale), which is

SAML2.0 authentication for airflow

2017-12-12 Thread Gerard Toonstra
Hello, Has anyone looked at / implemented /disconsidered saml2.0 authentication for airflow? Did a search on google, but this didn't return anything specific. Rgds, Gerard

Re: Data lineage and data portal

2017-12-06 Thread Gerard Toonstra
accept everyone and then we can start writing your ideas, references, things you looked at, tried, etc. See you there! Gerard On Sun, Dec 3, 2017 at 11:40 AM, Sam Elamin <hussam.ela...@gmail.com> wrote: > I'm def in. > > Thanks for organising Gerard! > > On Sun, 3 Dec 2017 at 07:

Re: Data lineage and data portal

2017-12-02 Thread Gerard Toonstra
Good morning, The meeting has been scheduled for wednesday 6th december: London 17:00:00 UTC, Amsterdam 18:00:00 UTC+1, San Francisco 09:00:00 PST, New York 12:00:00 EST Gerard Toonstra is inviting you to a scheduled Zoom meeting. Join from PC, Mac, Linux, iOS or Android: https

Re: Data lineage and data portal

2017-11-30 Thread Gerard Toonstra
working on sql scanners, extractors and other tools that > >> allow me > >>> to > >>>>> populate the database > >>>>> ‘’’ > >>>>> > >>>>> Very cool. Cloudera Navigator ( not an open source product) does

Data lineage and data portal

2017-11-27 Thread Gerard Toonstra
Hi all, So something that really drew my attention recently is a "data portal" as described by a team from airbnb somewhere in May. The idea is basically a "facebook of data": https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770 Unfortunately it looks like it's not

Re: Airflow configuration in environment variable not working

2017-11-09 Thread Gerard Toonstra
; AIRFLOW__GOOGLE__OAUTH_CALLBACK_ROUTE=/ > AIRFLOW__GOOGLE__DOMAIN=XXXXX > > -----Original Message- > From: Gerard Toonstra [mailto:gtoons...@gmail.com] > Sent: Thursday, November 9, 2017 1:59 PM > To: dev@airflow.incubator.apache.org > Subject: Re: Airflow configuration in envi

Re: Airflow configuration in environment variable not working

2017-11-09 Thread Gerard Toonstra
What's the variable key you are using. Does it follow this convention? https://airflow.apache.org/configuration.html That's AIRFLOW (two underscores) configuration section (two underscores) env var. G> On Thu, Nov 9, 2017 at 8:30 AM, Somasundaram Sekar < somasundar.se...@tigeranalytics.com>

Re: Airflow talk @ Coolblue

2017-11-04 Thread Gerard Toonstra
gt; wrote: > Any chance your talk was recorded? > > Thanks, > Mike > > > > On Oct 29, 2017, at 6:29 AM, Gerard Toonstra <gtoons...@gmail.com> > wrote: > > > > Hi all, > > > > Thursday the 26/10 my employer Coolblue organized a "Behind

Re: Airflow on ECS

2017-11-02 Thread Gerard Toonstra
Hey, As Bolke said, with LE and tasks consuming variable amounts of memory, you can run into memory issues on a container. I'd reconsider running on a containerized environment at all, because with the LE and the scheduler, you need to set up a huge one for that to work. You're probably better

Re: Airflow talk @ Coolblue

2017-10-29 Thread Gerard Toonstra
ntered when migrating from Azkaban to Airflow? > > Kind regards, > Fokko Driesprong > > 2017-10-29 11:29 GMT+01:00 Gerard Toonstra <gtoons...@gmail.com>: > > > Hi all, > > > > Thursday the 26/10 my employer Coolblue organized a "Behind the Scene

Airflow talk @ Coolblue

2017-10-29 Thread Gerard Toonstra
Hi all, Thursday the 26/10 my employer Coolblue organized a "Behind the Scenes" event. It is an opportunity for engineers to talk about stuff they work on and usually they provide two presentations. This event was about BigData and Processing. As (now) team lead of Data Platform, I decided to

Airflow talk at BigDataWeek London 13th October 2017

2017-09-26 Thread Gerard Toonstra
I'll be doing a little talk about Apache Airflow at the London BigDataWeek conference on 13th of October, focusing on how airflow is designed around following some important ETL principles and showcasing an example deployment solution architecture on AWS:

Re: Qs on airflow

2017-09-25 Thread Gerard Toonstra
Hi Larry, The important thing to question is what kind of interface that other system has. It is a little bit unusual in the sense that this DAG processes across multiple days. The potential issue I foresee here is that you don't mention a consistent start date for the DAG and you expect this to

Re: Stuck Tasks that don't report status

2017-08-07 Thread Gerard Toonstra
Hi David, When tasks are put on the MQ, they are out of the control of the scheduler. The scheduler puts the state of that task instance in "queued". What happens next: 1. A worker picks up the task to run and tries to run it. 2. It first executes a couple of checks against the DB prior to

Re: Airflow + Kubernetes discussion

2017-07-20 Thread Gerard Toonstra
send me an invite too! On Thu, Jul 20, 2017 at 8:17 PM, Jeremiah Lowin wrote: > I'm interested as well. > > On Thu, Jul 20, 2017 at 1:51 PM Marc Bollinger wrote: > > > +1 We're in the middle of moving some services to k8s, and have had our > > eye on

Re: Airflow kubernetes executor

2017-07-13 Thread Gerard Toonstra
It would be really good if you'd share experiences on how to run this on kubernetes and ECS. I'm not aware of a good guide on how to run this on either for example, but it's a very useful and quick setup to start with, especially combining that with deployment manager and cloudformation

Re: Airflow kubernetes executor

2017-07-06 Thread Gerard Toonstra
>> >>> > @chris: Thank you! My wiki name is dimberman. >>> > @gerard: I've started writing out my reply but there's a fair amount to >>> > respond to so I'll need a few minutes :). >>> > >>> > On Wed, Jul 5, 2017 at 1:17 PM Chris Riccom

Re: BigQuery Sensor Operator

2017-07-05 Thread Gerard Toonstra
There is an api where you can get table details in python. There are multiple apis all using the underlying rest one. The one i talk about is where you can call exists and get rownum and create and modified details. Saves some money and time perhaps. Sent from my iPhone > On 5 Jul 2017, at

Re: Split DAG code into several files.

2017-06-27 Thread Gerard Toonstra
_port='hostnme.jp.local:9553' > cdna_daily_common.env='dev' > cdna_daily_common.alert_email='dev-dsd-...@mail.com' > cdna_daily_common.spdb_sync_prefix='echo SPDBSync' > cdna_daily_common.post_validate_prefix='echo PostVal' > cdna_daily_common.schedule_interval='0 2 * * *' > cdna_daily_common.d

Re: Split DAG code into several files.

2017-06-27 Thread Gerard Toonstra
For airflow to find dags, a .py file is read. The file should contain either "DAG" or "airflow" somewhere to be considered a potential dag file. Then there are some additional rules whether this actually gets scheduled or not. The log files for the dag file processors is not the same as the main

Apache Airflow survey 2017-06-24

2017-06-24 Thread Gerard Toonstra
Hi all, The Apache Airflow survey period expired. The results are collected and, as promised, I'm sharing the full results of the survey. It's on the wiki with the raw survey results available at the bottom of the page:

airflow testing: hovercraft

2017-06-20 Thread Gerard Toonstra
Hi all, I was talking about a dev project I was working on and there's some progress: https://github.com/gtoonstra/airflow-hovercraft There are two types of tests: 1. behavior tests: These test the behavior of operators against a stubbed out "hook", which is driven through python "behave"

Re: Apache airflow usage survey

2017-06-10 Thread Gerard Toonstra
f we do not use it in > production yet but going to, are we eligible to take it? :) > Boris > > On Sat, Jun 10, 2017 at 6:40 AM, Gerard Toonstra <gtoons...@gmail.com> > wrote: > >> Hi all, >> >> I'm curious how others are using and deploying airfl

Apache airflow usage survey

2017-06-10 Thread Gerard Toonstra
Hi all, I'm curious how others are using and deploying airflow. Rather than inviting people to reply to this email on a dev list, I created a survey with 10 questions (all visible on first page): https://goo.gl/forms/FeSBMfI7O8oe8wZu2 I'm going to close and share the results of that survey in 2

Re: Airflow Testing Library

2017-06-01 Thread Gerard Toonstra
and I can add you. Rgds, Gerard On Thu, May 18, 2017 at 2:00 PM, Gerard Toonstra <gtoons...@gmail.com> wrote: > >> On Tue, May 9, 2017 at 9:46 PM, Arthur Wiedmer <arthur.wied...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I would love to

Strata conference London?

2017-05-23 Thread Gerard Toonstra
Is anyone at or going to visit the Strata conference London? I'll be there tomorrow and Thursday, would be good to connect with others using airflow, share experiences, have a coffee, etc. Rgds, Gerard

Re: Airflow Testing Library

2017-05-18 Thread Gerard Toonstra
ce and if/how this contributes to the bottom line of higher quality? Rgds, Gerard On Wed, May 10, 2017 at 8:27 AM, Gerard Toonstra <gtoons...@gmail.com> wrote: > Hi Laura, > > Yes, testing hooks and operators is about the basic behavior of those, so > you look for infrastruct

Re: Airflow Testing Library

2017-05-10 Thread Gerard Toonstra
scheduler go at it, and > > then > > > check the metadata database for what workflow happened (and, if we had > > test > > > integration services, maybe also check the output against the known > > output > > > for the seeded input). I can defini

Re: Airflow Testing Library

2017-05-09 Thread Gerard Toonstra
Very interesting video. I was unable to take part. I watched only part of it for now. Let us know where the discussion is being moved to. The confluence does indeed seem to be the place to put final conclusions and thoughts. For airflow, I like to make a distinction between "platform" and

Re: Accessing configuration parameters passed to Airflow through CLI

2017-04-27 Thread Gerard Toonstra
There was a discussion on google groups about that: https://groups.google.com/forum/#!topic/airbnb_airflow/GRdoW30PNUI On Thu, Apr 27, 2017 at 9:42 AM, Devjyoti Patra wrote: > Hi, > > I am trying to pass the following configuration parameters to Airflow CLI > while

Re: dag file processing times

2017-04-25 Thread Gerard Toonstra
a dag... > >> > >> > >> Sent from my iPhone > >> > >> > On 24 Apr 2017, at 22:46, Dan Davydov <dan.davy...@airbnb.com. > INVALID> > >> wrote: > >> > > >> > One idea to solve this is to use a daemon that uses

dag file processing times

2017-04-24 Thread Gerard Toonstra
Hey, I've seen some people complain about DAG file processing times. An issue was raised about this today: https://issues.apache.org/jira/browse/AIRFLOW-1139 I attempted to provide a good explanation what's going on. Feel free to validate and comment. I'm noticing that the file processor is a

Re: Article - Get started developing workflows with Apache Airflow

2017-04-05 Thread Gerard Toonstra
Very nice. I noticed a bug in one line: pip install pip install airflow==1.8.0 I see you've used the plugin class to add operators, so they appear in the airflow.operators namespace. I'm wondering about what other people are doing there and what the best way is to add custom operators to

Re: Scheduler silently dies

2017-03-27 Thread Gerard Toonstra
ags_are_paused_at_creation = True > non_pooled_task_slot_count = 128 > max_active_runs_per_dag = 16 > ... > > Pretty much the defaults; I've never tweaked these values. > > > > -N > nik.hodgkin...@collectivehealth.com > > On Mon, Mar 27, 2017 at 12:12 PM, Gerard Toonstr

Re: Scheduler silently dies

2017-03-27 Thread Gerard Toonstra
; done. > > > > Attaching to program: /usr/bin/python, process 2391 > > > > Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols > > from > > > > /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done. > > > > done. > > > > Loaded s

Re: Scheduler silently dies

2017-03-26 Thread Gerard Toonstra
> > > By the way, I remember that the scheduler would only spawn one or three > processes, but I may be wrong. > Right now when I start, it spawns 7 separate processes for the scheduler > (8 total) with some additional > ones spawned when the dag file processor starts. > > These other processes

Re: Scheduler silently dies

2017-03-26 Thread Gerard Toonstra
what may be helpful to dive into this a bit more is "pyrasite" . You need gdb installed on the machine, but afterwards you can attach to a running process and then use python "payloads" to investigate what's going on, for example dump the stack trace per threads:

Re: Make Scheduler More Centralized

2017-03-15 Thread Gerard Toonstra
Hi Rui, I worked a bit on the scheduler and added some of my comments below. On Tue, Mar 14, 2017 at 11:08 PM, Rui Wang wrote: > Hi, > The design doc below I created is trying to make airflow scheduler more > centralized. Briefly speaking, I propose moving state

Re: How to query a database and then send an email with the query results?

2017-03-10 Thread Gerard Toonstra
You can also generalize the tasks to make them more reusable: 1. an operator that runs a query and stores the result in a file in a generally available location (for all workers). 2. an operator extending the email operator that pulls in the file from the general location to the local worker and

Re: HttpOperator handling large GET response

2017-03-08 Thread Gerard Toonstra
Hi Ali, Sounds like a better thing to do. If the response gets too big, you'll probably want to store the results as a file or otherwise immediately process them in a bit of python code. If you want to write a custom operator for this depends on how many times you wish to reuse this

Re: Simple question about schedule_interval establishing clear interval boundaries.

2017-02-21 Thread Gerard Toonstra
35c99e51ea936c756c00332c4a4a/airflow/models.py#L1489> > > Bolke > > > On 21 Feb 2017, at 22:41, Gerard Toonstra <gtoons...@gmail.com> wrote: > > > > Hey all, > > > > I'm writing up a bit more about best practices for airflow and realize > that

Simple question about schedule_interval establishing clear interval boundaries.

2017-02-21 Thread Gerard Toonstra
Hey all, I'm writing up a bit more about best practices for airflow and realize that there may be one important macro that's missing, but which sounds really useful. This is a list of the default macro's: https://airflow.incubator.apache.org/code.html#macros The "execution_date" or "ds" is some

Re: Article: The Rise of the Data Engineer

2017-01-25 Thread Gerard Toonstra
> > > > Per this email thread, it almost sounds like a slack team/discourse for > data engineering might be useful. > > I certainly would not mind getting more knowledge on this topic and I'd like to be invited to such a slack group (or google group).

Re: Article: The Rise of the Data Engineer

2017-01-25 Thread Gerard Toonstra
You mentioned Vertica and Parquet. Is it recommended to use these newer tools even when the DWH is not BigData size (about 150G in size) ? So there are a couple of good benefits, but are there any downsides and disadvantages you have to take into account comparing Vertica vs. SQL Server for

Re: Airflow 2.0

2016-11-21 Thread Gerard Toonstra
More ideas: - An "airflow" plugin at the moment is more of an extension; operators, hooks, macros. Consider an additional plugin API + default implementation for code inside airflow that has a cross-cutting concern, like: * We start to use datadog for heavier monitoring of what's going

Re: Airflow 2.0

2016-11-21 Thread Gerard Toonstra
+1 on driving everything through a REST API including the UI. This unifies the access to the scheduler and increases stability. Consider running a very small webserver (node.js + socket.io), which enables airflow to communicate scheduler events as they happen to anything that connects to it

Re: Skip task

2016-11-08 Thread Gerard Toonstra
Also in 1.7.1.3, there's the ShortCircuitOperator, which can give you an example. https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/python_operator.py You'd have to modify this to your needs, but the way it works is that if the condition evaluates to True, none of the

Re: Astronomer.io Airflow blog post

2016-11-02 Thread Gerard Toonstra
.@apache.org> > wrote: > > > Gerard, > > Please sign up for a CWiki <https://github.com/apache/incubator-airflow> > > account and reply to this email with your user name. I just searched > > for "Gerard > > Toonstra" and didn't find a user on

Re: Questions on Airflow with CeleryExecutor

2016-11-02 Thread Gerard Toonstra
Hey Jason, Let me try to answer them for you. I hope I get everything 100% right, because I'm also pretty new to airflow. Hopefully someone on the list corrects me if it's horribly wrong. On Wed, Nov 2, 2016 at 9:24 PM, Jason Chen wrote: > Hi Airflow team, > > We are

Re: Astronomer.io Airflow blog post

2016-11-02 Thread Gerard Toonstra
They are both on the project page of the airflow documentation in resources & links and on the wiki, the wiki is a bit richer in that regard. Maybe link to the wiki from the doc pages instead, so it's all in one place? https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links

Re: Possible airflow-pool bug

2016-11-01 Thread Gerard Toonstra
David, When you say "massive" oversubscribing, are you running a lot of dags in parallel that use your configured pools? Access to pools is not atomic at the moment. Can you also quantify "massive" ? Not that it matters, but to get a better idea. Rgds, Gerard On Tue, Nov 1, 2016 at 4:09 PM,

Airflow state change diagram

2016-10-30 Thread Gerard Toonstra
I was looking at trying to fix AIRFLOW-137 (max_active_runs not respected), but quickly noticed that the code that does all the scheduling is rather complex with state updates going on across multiple source files in multiple threads, etc. It's then best to find a suitable way to visualize all

Re: Changing the crontab for a DAG

2016-10-25 Thread Gerard Toonstra
cron expressions: https://airflow.incubator.apache.org/scheduler.html#dag-runs On Tue, Oct 25, 2016 at 12:23 PM, Manning, Kieran (Consultant) < kieran.mann...@consultant.renre.com> wrote: > Hi all, > > Sorry to piggyback on this question thread but I have something similar. > We need to be able

Re: Changing the crontab for a DAG

2016-10-25 Thread Gerard Toonstra
What some people do is give the new dag a new version in the name, like _v1 or _v2 at the end. Then it's treated like another dag and you can disable the old one. IF you make changes to dags, it's possible that old operators/tasks are no longer visible in the UI and you no longer have access to

Re: ETL best practices for airflow

2016-10-22 Thread Gerard Toonstra
Even when the pool is 10 and the number of instances 7, it takes longer for the instances to actually run. Looking forward to your comments on how some approaches could be improved. Rgds, Gerard On Wed, Oct 19, 2016 at 8:17 AM, Gerard Toonstra <gtoons...@gmail.com> wrote: > > T

Re: ETL best practices for airflow

2016-10-18 Thread Gerard Toonstra
m hoping to see some best practices for the design of incremental > > loads > > > and using timestamps from source database systems (not being on UTC so > > > still confused about it in Airflow). Also practical use of subdags and > > > dynamic generation of t

Re: ETL best practices for airflow

2016-10-17 Thread Gerard Toonstra
ices for the design of incremental loads > and using timestamps from source database systems (not being on UTC so > still confused about it in Airflow). Also practical use of subdags and > dynamic generation of tasks using some external metadata (maybe describe in > details something simi

ETL best practices for airflow

2016-10-16 Thread Gerard Toonstra
Hi all, About a year ago, I contributed the HTTPOperator/Sensor and I've been tracking airflow since. Right now it looks like we're going to adopt airflow at the company I'm currently working at. In preparation for that, I've done a bit of research work how airflow pipelines should fit together,

Re: Airflow for business process workflows

2016-09-08 Thread Gerard Toonstra
Dinesh, Interesting use case. I'm not sure how this will work out for you eventually compared to a specialized workflow tool, but here are some considerations that you should make to evaluate your chances of success: A complex business workflow will at some point require some more complex input

Re: Scheduler getting stuck - request for details

2016-09-07 Thread Gerard Toonstra
The scheduler is probably single threaded, but it's a good idea to make sure and investigate postgres (or mysql) locks: https://wiki.postgresql.org/wiki/Lock_Monitoring On Wed, Sep 7, 2016 at 8:30 AM, Bolke de Bruin wrote: > Thanks! > > Apache scrubs attachments. Can you

Airflow on Google Cloud

2016-06-10 Thread Gerard Toonstra
Hi all, I did a demo of airflow for an organisation where they currently use azkaban and they liked the project and demonstrated interest in using it. The installation however was considered a bit more work than they wanted: mysql db, celery, rabbitMQ and scheduler that all had to be puppetized