It is an interesting question. On a slightly related note, Correct me if
I'm wrong, AFAIK we require restarting airflow scheduler in order pick any
new DAG file changes by the scheduler. In that case, should the scheduler
do the DAGFileProcessing every time before scheduling the tasks?
Regards,
Hi All,
We have a use case where there would be 100(s) of DAG files with schedule set
to "@once". Currently it seems that scheduler processes each and every file and
creates a Dag Object.
Is there a way or config to tell scheduler to stop processing certain files.
Thanks,
Raman Gupta
Thanks! I ended up creating a plugin and it's working OK.
On Mon, May 28, 2018 at 9:22 AM Driesprong, Fokko
wrote:
> Hi Pedro,
>
> You could just create a CustomHttpHook and place it on your pythonpath,
> then you should also create a CustomHttpSensor. Hope this helps.
>
> Cheers, Fokko
>
>
Thanks Christopher for the idea. That would work, we already have such a
"listener" that polls a queue (SQS) and creates the DAG runs. However it
would have been nice to have the full process in one DAG to have a
better overview about running jobs and leverage the gantt chart, but I
think this can
It is a bit tricky;
*Step 1:*
you can write an SLA miss callback, send email from the callback and empty
the `slas` object so that airflow won't sent SLA miss email.
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L688
*Step 2:*
You can reuse airflow `send_email` method
Since you already on AWS, the simplest thing I could think of is to write a
signal file once the job finished and the downstream job waiting for the
signal file. In other words, the same pattern how the Hadoop jobs writing
`_SUCCESS` file and the downstream jobs depends on the signal file.
Hi,
I was trying to find if its possible to create custom plugins without
dropping them into `$AIRFLOW_HOME/plugins` (or the directory defined in
`airflow.cfg`).
We can define one location in `airflow.cfg` but I have multiple projects
which will have their own workflows so, ideally I would want
Haven't done this, but we'll have a similar need in the future, so have
investigated a little.
What about a design pattern something like this:
1) When jobs are done (ready for further processing) they publish those
details to a queue (such as GC Pub/Sub or any other sort of queue)
2) A single
Hi team,
We had a use case where we wanted to serve different email body to different
use cases at the time of failure & up_for_retry, currently body seems to be
hard coded in models.py, Is there any plan to make it templatized in upcoming
future or it will be a good idea if we come across
Hi Pedro,
You could just create a CustomHttpHook and place it on your pythonpath,
then you should also create a CustomHttpSensor. Hope this helps.
Cheers, Fokko
2018-05-26 2:48 GMT+02:00 Pedro Machado :
> Hi,
>
> I am using HttpSensor to look for a file. The webserver is
Hi Stefan,
Afaik there isn't a more efficient way of doing this. DAGs that are relying
on a lot of sensors are experiencing the same issues. The only way right
now, I can think of, is doing updating the state directly in the database.
But then you need to know what you are doing. I can image that
This seemed like a very clear explanation of the JIRA ticket and the idea of
making dagruns depend not on a schedule but the arrival of a dataset.
I think a lot would have to change if the execution date was changed to a
parameterized value, and that's not the only thing that would have to
12 matches
Mail list logo