Re: Submitting 1000+ tasks to airflow programatically

2018-03-22 Thread Maxime Beauchemin
On the open PR I described how DagFetcher might imply a new DAG manifest (replacing the current DAG_FOLDER auto-parsing & discovery) that describes a list of dag_ids and related DAG URIs. That DAG manifest could be a static list OR a something dynamic if you pass it a callable. To enable the

Re: Submitting 1000+ tasks to airflow programatically

2018-03-22 Thread James Meickle
I'm very excited about the possibility of implementing a DAGFetcher (per prior thread about this) that is aware of dynamic data sources, and can handle abstracting/caching/deploying them itself, rather than having each Airflow process run the query for each DAG refresh. On Thu, Mar 22, 2018 at

Re: Submitting 1000+ tasks to airflow programatically

2018-03-22 Thread Taylor Edmiston
I'm interested in hearing further discussion too, and if others have tried something similar to our approach. Several companies on this list have mentioned various approaches to dynamic DAGs, and I think everyone needs them eventually. Maybe it's an opportunity for additional docs regarding use

Re: Submitting 1000+ tasks to airflow programatically

2018-03-22 Thread Kyle Hamlin
@Chris @Taylor Thank you guy very much for your explanations! Your strategy makes a lot of sense to me. Generating a dag for each client I'm going to have a ton of dags on the front page but at least that is searchable haha. I'm going to give this implementation a shot and I'll try to report back

TriggerDagRunOperator

2018-03-22 Thread Andreas Költringer
Hi, I am currently working on Airflow Tasks that would need to trigger other DagRuns for the past, i.e. for specific execution dates. However, the current implementation of TriggerDagRunOperator does not allow this, while manual triggering via the CLI does. There are also questions on SO [2]

Re: Submitting 1000+ tasks to airflow programatically

2018-03-22 Thread Taylor Edmiston
We're not using SubDagOperator. Our approach is using 1 DAG file to generate a separate DAG class instance for each similar config, which gets hoisted into global namespace. In simplified pseudo-Python, it looks like: # sources --> {'configs': [{...}, {...}], 'expire': ''} cache =

Re: Submitting 1000+ tasks to airflow programatically

2018-03-22 Thread David Capwell
For us we compile down to Python rather than do the logic in Python, that makes it so the load doesn't do real work. We have our own DSL that is just a simplified compiler; parse, analyze, optimize, code gen. In code gen we just generate the Python code. Our build then packages it up and have

Re: Submitting 1000+ tasks to airflow programatically

2018-03-22 Thread Andrew Maguire
I've had similar issues with large dags being slow to render on ui and crashing chrome. I got around it by changing the default tree view from 25 to just 5. Involves a couple changes to source files though, would be great if some of the ui defaults could go into airflow.cfg.