[ https://issues.apache.org/jira/browse/AIRFLOW-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Imberman closed AIRFLOW-57. ---------------------------------- Resolution: Auto Closed > Rename concurrency configuration variables to be more clear > ----------------------------------------------------------- > > Key: AIRFLOW-57 > URL: https://issues.apache.org/jira/browse/AIRFLOW-57 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler > Affects Versions: 1.7.0 > Reporter: Bence Nagy > Priority: Minor > Labels: easyfix, newbie > > Currently the following config variables exists for controlling parallel > execution limits: > {code} > # The amount of parallelism as a setting to the executor. This defines > # the max number of task instances that should run simultaneously > # on this airflow installation > parallelism = 32 > # The number of task instances allowed to run concurrently by the scheduler > dag_concurrency = 16 > # When not using pools, tasks are run in the "default pool", > # whose size is guided by this config element > non_pooled_task_slot_count = 128 > # The maximum number of active DAG runs per DAG > max_active_runs_per_dag = 16 > {code} > Let's go through these one by one: > {{parallelism}}: not a very descriptive name, considering that all the above > settings are for parallelism. The description says it sets the maximum task > instances for the airflow installation, which is a bit ambiguous — if I have > two hosts running airflow workers, I'd have airflow installed on two hosts, > so that should be two installations, but based on context 'per installation' > here means 'per Airflow state database'. I'd name this {{max_active_tasks}}. > {{dag_concurrency}}: Despite the name based on the comment this is actually > the task concurrency, and it's per worker. I'd name this > {{max_active_tasks_for_worker}} ({{per_worker}} would suggest that it's a > global setting for workers, but I think you can have workers with different > values set for this). > {{non_pooled_task_slot_count}}: This is a weird one. I'm going to pass on > suggesting a name for it because I just can't think of any reason this config > variable should exist. We already have a global task instance limit, and we > have pools to limit access to certain resources — in what case would someone > want to limit access to everything other than certain resources? So, yeah, > skipping this one. In case this was needed only due to how pools are > implemented, I'd suggest setting the limit to {{sys.maxsize}} and just > removing the config variable. > {{max_active_runs_per_dag}}: This one's kinda alright, but since it seems to > be just a default value for the matching {{DAG}} kwarg, it might be nice to > reflect that in the name, something like {{default_max_active_runs_for_dags}} > So let's move on to the {{DAG}} kwargs: > {{concurrency}}: Again, having a general name like this, coupled with the > fact that concurrency is used for something different elsewhere makes this > pretty confusing. I'd call this {{max_active_tasks}}. > {{max_active_runs}}: This one sounds alright to me. > So. If people agree that this is something that should be fixed, I think it'd > be nice to get this in the 1.7.1 release, especially considering that it > should be really easy to make the change backwards compatible. -- This message was sent by Atlassian Jira (v8.3.4#803005)