Re: Deployment / Execution Model

2018-10-31 Thread Gabriel Silk
I can see how my first email was confusing, where I said: "Our first attempt at productionizing Airflow used the vanilla DAGs folder, including all the deps of all the DAGs with the airflow binary itself" What I meant is that we have separate DAGs deployment, but we are being forced to package th

Re: Deployment / Execution Model

2018-10-31 Thread Gabriel Silk
Our DAG deployment is already a separate deployment from Airflow itself. The issue is that the Airflow binary (whether acting as webserver, scheduler, worker), is the one that *reads* the DAG files. So if you have, for example, a DAG that has this import statement in it: import mylib.foobar Then

what is error[111] and how to deal with it on sending the email notification?

2018-10-31 Thread rajasimmangandhi
# from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.operators.python_operator import PythonOperator from airflow.operators.email_operator import EmailOperator from airflow.utils.email import send_email_smtp import datetime as dt default_

Re: Deployment / Execution Model

2018-10-31 Thread Maxime Beauchemin
Deploying the DAGs should be decoupled from deploying Airflow itself. You can just use a resource that syncs the DAGs repo to the boxes on the Airflow cluster periodically (say every minute). Resource orchestrators like Chef, Ansible, Puppet, should have some easy way to do that. Either that or som

Deployment / Execution Model

2018-10-31 Thread Gabriel Silk
Hello Airflow community, I'm currently putting Airflow into production at my company of 2000+ people. The most significant sticking point so far is the deployment / execution model. I wanted to write up my experience so far in this matter and see how other people are dealing with this issue. Fir

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-10-31 Thread Maxime Beauchemin
A few related thoughts: * there may be hiccups around concurrency (pools, queues), though the worker should double-checks that the constraints are still met when firing the task, so in theory this should be ok * there may be more "misfires" meaning the task gets sent to the worker, but by the time

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-10-31 Thread Kevin Yang
Finally we start to talk about this seriously? Yeah! :D For your approach, a few thoughts: 1. Shard by # of files may not yield same load--even very different load since we may have some framework DAG file producing 500 DAG and take forever to parse. 2. I think Alex Guziel

REST API roadmap/plan?

2018-10-31 Thread matthew
I've been poking around Jira and confluence but haven't seen any roadmap or plans for the REST API. Did I just miss it or has it stalled out? I'm interested in working on it if it needs some help. Thanks -Matthew

A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-10-31 Thread Deng Xiaodong
Hi Folks, Previously I initiated a discussion about the best practice of Airflow setting-up, and it was agreed by a few folks that scheduler may become one of the bottleneck component (we can only run one scheduler instance, can only scale vertically rather than horizontally, etc.). Especially