Re: max_active_runs

2018-02-15 Thread James Meickle
I have made this mistake a few times. I think it would be great if Airflow
warned about DAG-level arguments being passed into tasks they don't apply
to, since that would indicate an easily fixable mistake.

On Wed, Feb 14, 2018 at 9:22 AM, Gerard Toonstra 
wrote:

> One of those days ...
>
> max_active_runs is a dag property. default_args only get passed as default
> args to task instances, but it never applies there.
>
> Thanks!
>
> G>
>
> On Wed, Feb 14, 2018 at 2:47 PM, Ash Berlin-Taylor <
> ash_airflowl...@firemirror.com> wrote:
>
> > It seems unlikely, but could it be the location of where max_active_runs
> > is specified? In our DAGs we pass it directly as an argument to the DAG()
> > call, not via default_arguments and it behaves itself for us. I think
> > I should check that!
> >
> > -ash
> >
> >
> > > On 14 Feb 2018, at 13:43, Gerard Toonstra  wrote:
> > >
> > > A user on airflow 1.9.0 reports that 'max_active_runs' isn't
> respected. I
> > > remembered having fixed something related to this ages ago and this is
> > > here:
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-137
> > >
> > > That however was related to backfills and clearing the dagruns.
> > >
> > > I watched him in the scenario and he literally creates a new simple dag
> > > with the following config:
> > >
> > > -
> > >
> > > from airflow import DAG
> > > from datetime import datetime, timedelta
> > >
> > > from airflow.contrib.operators.bigquery_operator import
> BigQueryOperator
> > > from airflow.contrib.operators.bigquery_to_gcs import
> > > BigQueryToCloudStorageOperator
> > > from airflow.contrib.operators.gcs_download_operator import
> > > GoogleCloudStorageDownloadOperator
> > > from airflow.contrib.operators.file_to_gcs import
> > > FileToGoogleCloudStorageOperator
> > > from airflow.operators.python_operator import PythonOperator
> > > from airflow.models import Variable
> > > import time
> > >
> > > default_args = {
> > >'owner': 'airflow',
> > >'start_date': datetime(2018, 2, 10),
> > >'max_active_runs': 1,
> > >'email_on_failure': False,
> > >'email_on_retry': False,
> > > }
> > >
> > > dag = DAG('analytics6', default_args=default_args,
> schedule_interval='15
> > 12
> > > * * *')
> > >
> > > -
> > >
> > > When it gets activated, multiple dagruns are created when there are
> still
> > > tasks running on the first date.
> > >
> > > His version is 1.9.0 from pypi.
> > >
> > > Is max_active_runs broken or are there other explanations for this
> > > particular behavior?
> > >
> > > Rgds,
> > >
> > > Gerard
> >
> >
>


Re: max_active_runs

2018-02-14 Thread Gerard Toonstra
One of those days ...

max_active_runs is a dag property. default_args only get passed as default
args to task instances, but it never applies there.

Thanks!

G>

On Wed, Feb 14, 2018 at 2:47 PM, Ash Berlin-Taylor <
ash_airflowl...@firemirror.com> wrote:

> It seems unlikely, but could it be the location of where max_active_runs
> is specified? In our DAGs we pass it directly as an argument to the DAG()
> call, not via default_arguments and it behaves itself for us. I think
> I should check that!
>
> -ash
>
>
> > On 14 Feb 2018, at 13:43, Gerard Toonstra  wrote:
> >
> > A user on airflow 1.9.0 reports that 'max_active_runs' isn't respected. I
> > remembered having fixed something related to this ages ago and this is
> > here:
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-137
> >
> > That however was related to backfills and clearing the dagruns.
> >
> > I watched him in the scenario and he literally creates a new simple dag
> > with the following config:
> >
> > -
> >
> > from airflow import DAG
> > from datetime import datetime, timedelta
> >
> > from airflow.contrib.operators.bigquery_operator import BigQueryOperator
> > from airflow.contrib.operators.bigquery_to_gcs import
> > BigQueryToCloudStorageOperator
> > from airflow.contrib.operators.gcs_download_operator import
> > GoogleCloudStorageDownloadOperator
> > from airflow.contrib.operators.file_to_gcs import
> > FileToGoogleCloudStorageOperator
> > from airflow.operators.python_operator import PythonOperator
> > from airflow.models import Variable
> > import time
> >
> > default_args = {
> >'owner': 'airflow',
> >'start_date': datetime(2018, 2, 10),
> >'max_active_runs': 1,
> >'email_on_failure': False,
> >'email_on_retry': False,
> > }
> >
> > dag = DAG('analytics6', default_args=default_args, schedule_interval='15
> 12
> > * * *')
> >
> > -
> >
> > When it gets activated, multiple dagruns are created when there are still
> > tasks running on the first date.
> >
> > His version is 1.9.0 from pypi.
> >
> > Is max_active_runs broken or are there other explanations for this
> > particular behavior?
> >
> > Rgds,
> >
> > Gerard
>
>


Re: max_active_runs

2018-02-14 Thread Ash Berlin-Taylor
It seems unlikely, but could it be the location of where max_active_runs is 
specified? In our DAGs we pass it directly as an argument to the DAG() call, 
not via default_arguments and it behaves itself for us. I think  I should 
check that!

-ash


> On 14 Feb 2018, at 13:43, Gerard Toonstra  wrote:
> 
> A user on airflow 1.9.0 reports that 'max_active_runs' isn't respected. I
> remembered having fixed something related to this ages ago and this is
> here:
> 
> https://issues.apache.org/jira/browse/AIRFLOW-137
> 
> That however was related to backfills and clearing the dagruns.
> 
> I watched him in the scenario and he literally creates a new simple dag
> with the following config:
> 
> -
> 
> from airflow import DAG
> from datetime import datetime, timedelta
> 
> from airflow.contrib.operators.bigquery_operator import BigQueryOperator
> from airflow.contrib.operators.bigquery_to_gcs import
> BigQueryToCloudStorageOperator
> from airflow.contrib.operators.gcs_download_operator import
> GoogleCloudStorageDownloadOperator
> from airflow.contrib.operators.file_to_gcs import
> FileToGoogleCloudStorageOperator
> from airflow.operators.python_operator import PythonOperator
> from airflow.models import Variable
> import time
> 
> default_args = {
>'owner': 'airflow',
>'start_date': datetime(2018, 2, 10),
>'max_active_runs': 1,
>'email_on_failure': False,
>'email_on_retry': False,
> }
> 
> dag = DAG('analytics6', default_args=default_args, schedule_interval='15 12
> * * *')
> 
> -
> 
> When it gets activated, multiple dagruns are created when there are still
> tasks running on the first date.
> 
> His version is 1.9.0 from pypi.
> 
> Is max_active_runs broken or are there other explanations for this
> particular behavior?
> 
> Rgds,
> 
> Gerard