Re: ETL best practices for airflow

2016-10-25 Thread Gerard Toonstra
Hi Boris, Thanks very much! These are all valid points and I'll work them out in the next couple of days. Indeed, the documentation of what the code actually does is rather limited, I should probably have described the overall strategy , so the code is easier to follow. Thanks for the ideas for

Re: ETL best practices for airflow

2016-10-25 Thread Boris Tyukin
Hi Gerard, I like your examples but compared to the first sections of your document, that last section felt a bit rushed. By looking at actual example and your comments above (the issues you discovered) I was able to comprehend but for people new to Airflow it might be a bit confusing. Why you did

Re: ETL best practices for airflow

2016-10-22 Thread Gerard Toonstra
Hi all, So I worked out a full pipeline for a toy data warehouse on postgres: https://gtoonstra.github.io/etl-with-airflow/fullexample.html https://github.com/gtoonstra/etl-with-airflow/tree/master/ examples/full-example It demonstrates pretty much all listed principles for ETL work except for

Re: ETL best practices for airflow

2016-10-18 Thread Gerard Toonstra
Thanks Max, I think it always helps when new people start using software to see what their issues are. Some of it was also taken from the video on best practices in nov. 2015 on this page: https://www.youtube.com/watch?v=dgaoqOZlvEA&feature=youtu.be I made some more progress yesterday, bu

Re: ETL best practices for airflow

2016-10-18 Thread Maxime Beauchemin
This is an amazing thread to follow! I'm really interested to watch best practices documentation emerge out of the community. Gerard, I enjoyed reading your docs and would love to see this grow. I've been meaning to write a series of blog posts on the subject for quite some time. It seems like you

Re: ETL best practices for airflow

2016-10-17 Thread Gerard Toonstra
Hi Laura, Looks very good. What I had to do first when I started was to figure out relevant concepts for ETL, I don't have a BI background. When I follow the tutorial and look at the examples, it's clear what airflow can do conceptually, but as soon as I want to get started on something, there's n

Re: ETL best practices for airflow

2016-10-17 Thread Boris Tyukin
Thanks for sharing your slides, Laura! I think I've watched all the airflow related slides I could find and you did a very good job - adding your slides to my collection :) I especially liked how were explaining execution date concept but I wish you could elaborate on a backfill concept and runnin

Re: ETL best practices for airflow

2016-10-17 Thread Laura Lorenz
Same! I actually recently gave a talk about how my company uses airflow at PyData DC. The video isn't live yet, but the slides are here . In substance it's actually very similar to wh

Re: ETL best practices for airflow

2016-10-17 Thread Gerard Toonstra
Hi all, Today I was trying to work out a very basic example and very quickly ran into an hour of trying to solve a problem that ought to be really easy. I didn't expect that. I posted about this on gitter.im and someone helped me out there. All the simple database operators (mysql, postgres, mss

Re: ETL best practices for airflow

2016-10-16 Thread Boris Tyukin
I really look forward to it, Gerard! I've read what you you wrote so far and I really liked it - please keep up the great job! I am hoping to see some best practices for the design of incremental loads and using timestamps from source database systems (not being on UTC so still confused about it i

ETL best practices for airflow

2016-10-16 Thread Gerard Toonstra
Hi all, About a year ago, I contributed the HTTPOperator/Sensor and I've been tracking airflow since. Right now it looks like we're going to adopt airflow at the company I'm currently working at. In preparation for that, I've done a bit of research work how airflow pipelines should fit together,