Re: Disable Processing of DAG file

2018-05-28 Thread Ananth Durai
It is an interesting question. On a slightly related note, Correct me if I'm wrong, AFAIK we require restarting airflow scheduler in order pick any new DAG file changes by the scheduler. In that case, should the scheduler do the DAGFileProcessing every time before scheduling the tasks? Regards,

Disable Processing of DAG file

2018-05-28 Thread ramandumcs
Hi All, We have a use case where there would be 100(s) of DAG files with schedule set to "@once". Currently it seems that scheduler processes each and every file and creates a Dag Object. Is there a way or config to tell scheduler to stop processing certain files. Thanks, Raman Gupta

Re: HttpSensor raising exception with status=403

2018-05-28 Thread Pedro Machado
Thanks! I ended up creating a plugin and it's working OK. On Mon, May 28, 2018 at 9:22 AM Driesprong, Fokko wrote: > Hi Pedro, > > You could just create a CustomHttpHook and place it on your pythonpath, > then you should also create a CustomHttpSensor. Hope this helps. > > Cheers, Fokko > >

Re: How to wait for external process

2018-05-28 Thread Stefan Seelmann
Thanks Christopher for the idea. That would work, we already have such a "listener" that polls a queue (SQS) and creates the DAG runs. However it would have been nice to have the full process in one DAG to have a better overview about running jobs and leverage the gantt chart, but I think this can

Re: Alert Emails Templatizing

2018-05-28 Thread Ananth Durai
It is a bit tricky; *Step 1:* you can write an SLA miss callback, send email from the callback and empty the `slas` object so that airflow won't sent SLA miss email. https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L688 *Step 2:* You can reuse airflow `send_email` method

Re: How to wait for external process

2018-05-28 Thread Ananth Durai
Since you already on AWS, the simplest thing I could think of is to write a signal file once the job finished and the downstream job waiting for the signal file. In other words, the same pattern how the Hadoop jobs writing `_SUCCESS` file and the downstream jobs depends on the signal file.

Ability to discover custom plugins, operators, sensors, etc. from various locations

2018-05-28 Thread Ritesh Shrivastav
Hi, I was trying to find if its possible to create custom plugins without dropping them into `$AIRFLOW_HOME/plugins` (or the directory defined in `airflow.cfg`). We can define one location in `airflow.cfg` but I have multiple projects which will have their own workflows so, ideally I would want

Re: How to wait for external process

2018-05-28 Thread Christopher Bockman
Haven't done this, but we'll have a similar need in the future, so have investigated a little. What about a design pattern something like this: 1) When jobs are done (ready for further processing) they publish those details to a queue (such as GC Pub/Sub or any other sort of queue) 2) A single

Alert Emails Templatizing

2018-05-28 Thread vardanguptacse
Hi team, We had a use case where we wanted to serve different email body to different use cases at the time of failure & up_for_retry, currently body seems to be hard coded in models.py, Is there any plan to make it templatized in upcoming future or it will be a good idea if we come across

Re: HttpSensor raising exception with status=403

2018-05-28 Thread Driesprong, Fokko
Hi Pedro, You could just create a CustomHttpHook and place it on your pythonpath, then you should also create a CustomHttpSensor. Hope this helps. Cheers, Fokko 2018-05-26 2:48 GMT+02:00 Pedro Machado : > Hi, > > I am using HttpSensor to look for a file. The webserver is

Re: How to wait for external process

2018-05-28 Thread Driesprong, Fokko
Hi Stefan, Afaik there isn't a more efficient way of doing this. DAGs that are relying on a lot of sensors are experiencing the same issues. The only way right now, I can think of, is doing updating the state directly in the database. But then you need to know what you are doing. I can image that

Re: Using Airflow with dataset dependant flows (not date)

2018-05-28 Thread Daniel (Daniel Lamblin) [BDP - Seoul]
This seemed like a very clear explanation of the JIRA ticket and the idea of making dagruns depend not on a schedule but the arrival of a dataset. I think a lot would have to change if the execution date was changed to a parameterized value, and that's not the only thing that would have to