nadflinn opened a new pull request #8695: URL: https://github.com/apache/airflow/pull/8695
There is some debate about whether Celery autoscale actually works, as discussed in the comments to this PR: https://github.com/apache/airflow/pull/3989#issuecomment-535882666 This is also related to the discussion in this issue: https://github.com/apache/airflow/issues/8480 I ran into this issue as well with Airflow (autoscale not working) and had a look at the celery code and I think the issue is that for the worker process count to grow this is dependent on the number of tasks the worker has claimed, known as the prefetch_count. If the prefetch_count isn't above the worker process count, then the number of worker processes won't budge. It seems like a catch-22. Airflow runs into this problem because `worker_prefetch_multiplier` is set to 1 (and `task_acks_late` is set to True...setting this to False also bumps the prefetch_count). This issue can be worked around by setting the `worker_prefetch_multiplier` setting to an int greater than 1. In this PR I included a note about the implications of this in the config and a link to relevant documentation. Currently in airflow `worker_prefetch_multiplier` is set to 1 so a worker can't prefetch and lay claim to more tasks than it has process workers. So in theory setting this to 2 can get you into trouble if you have worker A that has 6 processes and has grabbed 10 tasks and the 6 tasks it is working on are long running causing the other 4 tasks to be blocked. Meanwhile worker B just finished up processing its own 6 tasks and is available to work on the 4 that are backed up on worker A but A has already claimed those tasks. If you are running one worker, though, then this shouldn't be a problem. This PR makes `worker_prefetch_multiplier` configurable so that the user can get autoscale working if they feel that for their use case `worker_prefetch_multiplier` of greater than 1 won't be an issue. I also [opened up a Celery PR](https://github.com/celery/celery/pull/6069) with a suggested fix for this issue. --- Make sure to mark the boxes below before creating PR: [x] - [X] Description above provides context of the change - [X] Unit tests coverage for changes (not needed for documentation changes) - [X] Target Github ISSUE in description if exists - [X] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)" - [X] Relevant documentation is updated including usage instructions. - [X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example). --- In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org