[ 
https://issues.apache.org/jira/browse/AIRFLOW-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-57.
----------------------------------
    Resolution: Auto Closed

> Rename concurrency configuration variables to be more clear
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-57
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-57
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 1.7.0
>            Reporter: Bence Nagy
>            Priority: Minor
>              Labels: easyfix, newbie
>
> Currently the following config variables exists for controlling parallel 
> execution limits:
> {code}
> # The amount of parallelism as a setting to the executor. This defines
> # the max number of task instances that should run simultaneously
> # on this airflow installation
> parallelism = 32
> # The number of task instances allowed to run concurrently by the scheduler
> dag_concurrency = 16
> # When not using pools, tasks are run in the "default pool",
> # whose size is guided by this config element
> non_pooled_task_slot_count = 128
> # The maximum number of active DAG runs per DAG
> max_active_runs_per_dag = 16
> {code}
> Let's go through these one by one:
> {{parallelism}}: not a very descriptive name, considering that all the above 
> settings are for parallelism. The description says it sets the maximum task 
> instances for the airflow installation, which is a bit ambiguous — if I have 
> two hosts running airflow workers, I'd have airflow installed on two hosts, 
> so that should be two installations, but based on context 'per installation' 
> here means 'per Airflow state database'. I'd name this {{max_active_tasks}}.
> {{dag_concurrency}}: Despite the name based on the comment this is actually 
> the task concurrency, and it's per worker. I'd name this 
> {{max_active_tasks_for_worker}} ({{per_worker}} would suggest that it's a 
> global setting for workers, but I think you can have workers with different 
> values set for this).
> {{non_pooled_task_slot_count}}: This is a weird one. I'm going to pass on 
> suggesting a name for it because I just can't think of any reason this config 
> variable should exist. We already have a global task instance limit, and we 
> have pools to limit access to certain resources — in what case would someone 
> want to limit access to everything other than certain resources? So, yeah, 
> skipping this one. In case this was needed only due to how pools are 
> implemented, I'd suggest setting the limit to {{sys.maxsize}} and just 
> removing the config variable.
> {{max_active_runs_per_dag}}: This one's kinda alright, but since it seems to 
> be just a default value for the matching {{DAG}} kwarg, it might be nice to 
> reflect that in the name, something like {{default_max_active_runs_for_dags}}
> So let's move on to the {{DAG}} kwargs:
> {{concurrency}}: Again, having a general name like this, coupled with the 
> fact that concurrency is used for something different elsewhere makes this 
> pretty confusing. I'd call this {{max_active_tasks}}.
> {{max_active_runs}}: This one sounds alright to me.
> So. If people agree that this is something that should be fixed, I think it'd 
> be nice to get this in the 1.7.1 release, especially considering that it 
> should be really easy to make the change backwards compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to