Re: max_active_runs
I have made this mistake a few times. I think it would be great if Airflow warned about DAG-level arguments being passed into tasks they don't apply to, since that would indicate an easily fixable mistake. On Wed, Feb 14, 2018 at 9:22 AM, Gerard Toonstra wrote: > One of those days ... > > max_active_runs is a dag property. default_args only get passed as default > args to task instances, but it never applies there. > > Thanks! > > G> > > On Wed, Feb 14, 2018 at 2:47 PM, Ash Berlin-Taylor < > ash_airflowl...@firemirror.com> wrote: > > > It seems unlikely, but could it be the location of where max_active_runs > > is specified? In our DAGs we pass it directly as an argument to the DAG() > > call, not via default_arguments and it behaves itself for us. I think > > I should check that! > > > > -ash > > > > > > > On 14 Feb 2018, at 13:43, Gerard Toonstra wrote: > > > > > > A user on airflow 1.9.0 reports that 'max_active_runs' isn't > respected. I > > > remembered having fixed something related to this ages ago and this is > > > here: > > > > > > https://issues.apache.org/jira/browse/AIRFLOW-137 > > > > > > That however was related to backfills and clearing the dagruns. > > > > > > I watched him in the scenario and he literally creates a new simple dag > > > with the following config: > > > > > > - > > > > > > from airflow import DAG > > > from datetime import datetime, timedelta > > > > > > from airflow.contrib.operators.bigquery_operator import > BigQueryOperator > > > from airflow.contrib.operators.bigquery_to_gcs import > > > BigQueryToCloudStorageOperator > > > from airflow.contrib.operators.gcs_download_operator import > > > GoogleCloudStorageDownloadOperator > > > from airflow.contrib.operators.file_to_gcs import > > > FileToGoogleCloudStorageOperator > > > from airflow.operators.python_operator import PythonOperator > > > from airflow.models import Variable > > > import time > > > > > > default_args = { > > >'owner': 'airflow', > > >'start_date': datetime(2018, 2, 10), > > >'max_active_runs': 1, > > >'email_on_failure': False, > > >'email_on_retry': False, > > > } > > > > > > dag = DAG('analytics6', default_args=default_args, > schedule_interval='15 > > 12 > > > * * *') > > > > > > - > > > > > > When it gets activated, multiple dagruns are created when there are > still > > > tasks running on the first date. > > > > > > His version is 1.9.0 from pypi. > > > > > > Is max_active_runs broken or are there other explanations for this > > > particular behavior? > > > > > > Rgds, > > > > > > Gerard > > > > >
Re: max_active_runs
One of those days ... max_active_runs is a dag property. default_args only get passed as default args to task instances, but it never applies there. Thanks! G> On Wed, Feb 14, 2018 at 2:47 PM, Ash Berlin-Taylor < ash_airflowl...@firemirror.com> wrote: > It seems unlikely, but could it be the location of where max_active_runs > is specified? In our DAGs we pass it directly as an argument to the DAG() > call, not via default_arguments and it behaves itself for us. I think > I should check that! > > -ash > > > > On 14 Feb 2018, at 13:43, Gerard Toonstra wrote: > > > > A user on airflow 1.9.0 reports that 'max_active_runs' isn't respected. I > > remembered having fixed something related to this ages ago and this is > > here: > > > > https://issues.apache.org/jira/browse/AIRFLOW-137 > > > > That however was related to backfills and clearing the dagruns. > > > > I watched him in the scenario and he literally creates a new simple dag > > with the following config: > > > > - > > > > from airflow import DAG > > from datetime import datetime, timedelta > > > > from airflow.contrib.operators.bigquery_operator import BigQueryOperator > > from airflow.contrib.operators.bigquery_to_gcs import > > BigQueryToCloudStorageOperator > > from airflow.contrib.operators.gcs_download_operator import > > GoogleCloudStorageDownloadOperator > > from airflow.contrib.operators.file_to_gcs import > > FileToGoogleCloudStorageOperator > > from airflow.operators.python_operator import PythonOperator > > from airflow.models import Variable > > import time > > > > default_args = { > >'owner': 'airflow', > >'start_date': datetime(2018, 2, 10), > >'max_active_runs': 1, > >'email_on_failure': False, > >'email_on_retry': False, > > } > > > > dag = DAG('analytics6', default_args=default_args, schedule_interval='15 > 12 > > * * *') > > > > - > > > > When it gets activated, multiple dagruns are created when there are still > > tasks running on the first date. > > > > His version is 1.9.0 from pypi. > > > > Is max_active_runs broken or are there other explanations for this > > particular behavior? > > > > Rgds, > > > > Gerard > >
Re: max_active_runs
It seems unlikely, but could it be the location of where max_active_runs is specified? In our DAGs we pass it directly as an argument to the DAG() call, not via default_arguments and it behaves itself for us. I think I should check that! -ash > On 14 Feb 2018, at 13:43, Gerard Toonstra wrote: > > A user on airflow 1.9.0 reports that 'max_active_runs' isn't respected. I > remembered having fixed something related to this ages ago and this is > here: > > https://issues.apache.org/jira/browse/AIRFLOW-137 > > That however was related to backfills and clearing the dagruns. > > I watched him in the scenario and he literally creates a new simple dag > with the following config: > > - > > from airflow import DAG > from datetime import datetime, timedelta > > from airflow.contrib.operators.bigquery_operator import BigQueryOperator > from airflow.contrib.operators.bigquery_to_gcs import > BigQueryToCloudStorageOperator > from airflow.contrib.operators.gcs_download_operator import > GoogleCloudStorageDownloadOperator > from airflow.contrib.operators.file_to_gcs import > FileToGoogleCloudStorageOperator > from airflow.operators.python_operator import PythonOperator > from airflow.models import Variable > import time > > default_args = { >'owner': 'airflow', >'start_date': datetime(2018, 2, 10), >'max_active_runs': 1, >'email_on_failure': False, >'email_on_retry': False, > } > > dag = DAG('analytics6', default_args=default_args, schedule_interval='15 12 > * * *') > > - > > When it gets activated, multiple dagruns are created when there are still > tasks running on the first date. > > His version is 1.9.0 from pypi. > > Is max_active_runs broken or are there other explanations for this > particular behavior? > > Rgds, > > Gerard