Re: Disable Processing of DAG file

2018-05-29 Thread Maxime Beauchemin
The TLDR of how the processor works is: while True: * sets a multiprocessing queue with N processes (say 32) * main process looks for the list of all .py files in DAGS_FOLDER * fills in the queue with all .py * each one of the 32 suprocess opens a file and interprets it (it's insulated from the ma

Re: What are the rules / policies for graduating classes out of airflow.contrib?

2018-05-29 Thread Maxime Beauchemin
* At least one active committer that runs that code in their environment and cares enough and has enough context to review / fix things if need be * Decent code quality * Decent unit test coverage * Decent underlying libraries (no dependencies on unmaintained/unpopular libs) About the wiki I agree

Re: conn_id breaking change; once more with feeling

2018-05-29 Thread Maxime Beauchemin
The main reason for the conn_id prefix is to facilitate the use of `default_args`. Because of this you can set all your connections at the top of your script and from that point on you just instantiate tasks without re-stating connections. It's common for people to define multiple "operating contex

Re: Using Airflow with dataset dependant flows (not date)

2018-05-29 Thread Maxime Beauchemin
Hi, Assuming the shape of your DAG is the same across runs, the prescribed way is to go with the DAG with a schedule_interval=None and to create your DAG Runs on demand. You can do so programmatically (using the ORM: airflow.models.DagRun) (cli: airflow trigger_dag) or through REST. If your DAG s

Re: Convert Dag Run from Backfill to Scheduled?

2018-05-29 Thread Maxime Beauchemin
Yes, clearly the DAG runs be can in inconsistent states with related task instances and backfill processes. Here's a quick patch that helps a little: https://github.com/apache/incubator-airflow/pull/3433 After writing the quick patch above I'm thinking it requires a bit more thinking. The clear co

conn_id breaking change; once more with feeling

2018-05-29 Thread Daniel (Daniel Lamblin) [BDP - Seoul]
The short of this email is: can we please name all the connection id named parameters to all hooks and operators as similarly as possible. EG just `conn_id`? So, when we started using Airflow I _had_ thought that minor versions would be compatible for a user's DAG, assuming no use of anything m

Re: Using Airflow with dataset dependant flows (not date)

2018-05-29 Thread Daniel (Daniel Lamblin) [BDP - Seoul]
Hi Javier; I'm afraid I'm not familiar enough with the overall architecture of Airflow to propose the right set of changes, and to decompose the work into PRs that are independently staged. But as dataset based processing is one of the items keeping some teams in my company on an internal schedu

Re: Convert Dag Run from Backfill to Scheduled?

2018-05-29 Thread Ruiqin Yang
This line is where the scheduler skips the backfill DAG runs. Despite what state the DAG run is in, tasks in DAG run starts with 'backfill_' would not be considered when scheduling. I agree with Dan Davydov's idea that

Re: Convert Dag Run from Backfill to Scheduled?

2018-05-29 Thread Scott Halgrim
Well I’ve gone ahead and run the UPDATE query now, so the scheduler is picking up tasks. When I cleared the tasks, every DAG run that had a cleared task in it was set to running. Because I’d backfilled them all they were all `backfill_` dag runs.  Inspection of various tasks via `task_failed_de

KubernetesPodOperator: Invalid arguments were passed to BaseOperator

2018-05-29 Thread Craig Rodrigues
I tested master branch by putting the following in my requirements.txt: git+https://github.com/rodrigc/incubator-airflow@master#egg=apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,redis,slack,s3] and did a pip install -r requirements.txt When I started the airflow webserver, I saw

Re: Convert Dag Run from Backfill to Scheduled?

2018-05-29 Thread Maxime Beauchemin
While this may work it's clearly not the prescribed way to do this. Clearing should just work. I'm trying to understand why the scheduler is not picking up the cleared task. Clearing should remove the task instance state and set the state of the related DAG Run to running so that the scheduler pic

What are the rules / policies for graduating classes out of airflow.contrib?

2018-05-29 Thread Tim Swast
I'm investigating what is required for graduating operators / sensors / ... out of airflow.contrib, but I couldn't find any official docs on either the wiki, the docs, or GitHub. What are the requirements for moving something out of contrib? P.S. It seems like the wiki is pretty locked down. I do

Re: Alert Emails Templatizing

2018-05-29 Thread vardanguptacse
Thanks Ananth for explaining in well mannered order, that seems to be quite good idea but I doubt that will lead to code changes in existing code and would become backward compatible with latest releases, would that be good idea to wait for this PR: https://github.com/apache/incubator-airflow/

Re: Templatizing Email Content

2018-05-29 Thread vardanguptacse
Thanks Alek, I'll keep an eye on this PR. On 2018/05/29 17:14:00, Alek Storm wrote: > I submitted a PR implementing this ( > https://github.com/apache/incubator-airflow/pull/2338) a while ago, but it > languished and I missed Fokko's request to revive it. I've rebased and it's > ready for anothe

Re: Templatizing Email Content

2018-05-29 Thread Alek Storm
I submitted a PR implementing this ( https://github.com/apache/incubator-airflow/pull/2338) a while ago, but it languished and I missed Fokko's request to revive it. I've rebased and it's ready for another review. Alek On Tue, May 29, 2018 at 10:08 AM vardangupta...@gmail.com < vardangupta...@gma

Templatizing Email Content

2018-05-29 Thread vardanguptacse
Currently email delivery can be setup on dag failures & up_for_retry by setting up smtp server, but the email body is kind of hard coded in Models.py, Is there any plan to make it templatized, we wanted to dynamically change email body depending upon usecases. And would that be appreciable, if

Re: How to wait for external process

2018-05-29 Thread Victor Noagbodji
hi, here's another vote for persistence. we did similar thing where processing state is stored in the database. there is no part of the DAG that does a periodic check. the DAG retriggers itself and its very first task is to figure out if there is work to do or bail out. > On May 28, 2018, at 4

Re: Using Airflow with dataset dependant flows (not date)

2018-05-29 Thread Javier Domingo Cansino
Hello Daniel, Thanks for your answer, I have been able to try your suggested solution, and as expected it works fine. However I have found that because the parametrization always comes with an execution_date, it can be misleading to users to have all runs still depending on that parameter. I could