nadflinn opened a new pull request #8695:
URL: https://github.com/apache/airflow/pull/8695


   There is some debate about whether Celery autoscale actually works, as 
discussed in the comments to this PR:
   https://github.com/apache/airflow/pull/3989#issuecomment-535882666
   
   This is also related to the discussion in this issue:
   https://github.com/apache/airflow/issues/8480
   
   I ran into this issue as well with Airflow (autoscale not working) and had a 
look at the celery code and I think the issue is that for the worker process 
count to grow this is dependent on the number of tasks the worker has claimed, 
known as the prefetch_count.  If the prefetch_count isn't above the worker 
process count, then the number of worker processes won't budge. It seems like a 
catch-22. Airflow runs into this problem because `worker_prefetch_multiplier` 
is set to 1 (and `task_acks_late` is set to True...setting this to False also 
bumps the prefetch_count).
   
   This issue can be worked around by setting the `worker_prefetch_multiplier` 
setting to an int greater than 1.  In this PR I included a note about the 
implications of this in the config and a link to relevant documentation.   
Currently in airflow `worker_prefetch_multiplier` is set to 1 so a worker can't 
prefetch and lay claim to more tasks than it has process workers.  So in theory 
setting this to 2 can get you into trouble if you have worker A that has 6 
processes and has grabbed 10 tasks and the 6 tasks it is working on are long 
running causing the other 4 tasks to be blocked.  Meanwhile worker B just 
finished up processing its own 6 tasks and is available to work on the 4 that 
are backed up on worker A but A has already claimed those tasks.  If you are 
running one worker, though, then this shouldn't be a problem.
   
   This PR makes `worker_prefetch_multiplier` configurable so that the user can 
get autoscale working if they feel that for their use case 
`worker_prefetch_multiplier` of greater than 1 won't be an issue.
   
   I also [opened up a Celery PR](https://github.com/celery/celery/pull/6069) 
with a suggested fix for this issue.
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [X] Description above provides context of the change
   - [X] Unit tests coverage for changes (not needed for documentation changes)
   - [X] Target Github ISSUE in description if exists
   - [X] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [X] Relevant documentation is updated including usage instructions.
   - [X] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to