eladkal closed issue #39088: Many running task instances are cleared by the new
scheduler when an old scheduler is terminated and its health check server is
periodically requested
URL: https://github.com/apache/airflow/issues/39088
--
This is an automated message from the Apache Git
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2094155454
@RNHTTR
Hi, I created a PR to fix this issue:
https://github.com/apache/airflow/pull/39406
PTAL at your convenience.
--
This is an automated message from the Apache Git
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2093473579
I think I found the cause of this issue.
The logic of `adopt_or_reset_orphaned_tasks` is called 2 times in a very
short time due to the use of `run_with_db_retries`
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2072822564
@RNHTTR
Tested and reproduced the issue on Airflow 2.8.4, below is the error logs
from scheduler and one terminated pod
```
2024-04-23T15:56:15.805+]
RNHTTR commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2072601414
@tanvn Can you share the full traceback?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2072363288
@paramjeet01
I use pip to install from the source code locally:
https://github.com/apache/airflow/blob/4ae85d754e9f8a65d461e86eb6111d3b9974a065/INSTALL#L73
on a docker
paramjeet01 commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2071929213
> @dstandish
>
> ~I would like to build a Docker image from the source code (which I am
going to add some modifications for testing) on my Macbook (Apple M2), is there
paramjeet01 commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2068007395
@tanvn @RNHTTR , Could this issue be related to the one I raised :
https://github.com/apache/airflow/issues/39096
We are running 8 schedulers and we see this issue
RNHTTR commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2065334748
Thanks for the extra details! I've assigned you to the Issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2063824238
@dstandish
I would like to build a Docker image from the source code (which I am going
to add some modifications for testing) on my Macbook (Apple M2), is there any
document
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2063329564
**Just tested with Airflow 2.8.4 and confirmed that the issue still
happens.**
```
[2024-04-18T08:29:32.060+] {kubernetes_executor.py:661} INFO -
attempting to
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2063070213
I managed to reproduce this issue on my development environment by creating
a DAG with about 100 tasks. When there are about 15 running tasks, I run a helm
command to upgrade the
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2062863762
@dstandish
Thanks! Yes, I believe I can reproduce on my dev environment and I want to
build a docker image from the source code, if there is a document, I would
appreciate if you
RNHTTR commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2062447790
@tanvn Can you check the [audit
logs](https://airflow.apache.org/docs/apache-airflow/stable/security/audit_logs.html)
for a task with the following log?
```
[2024-04-17,
dstandish commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2061970202
Hi @tanvn
It seems you've made some progress with your repro scenario. What I would
recommend doing is continue digging in. Add more logging to try to see exactly
what
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2061708972
And @dstandish
it seems that this issue is related to the health check server of the
scheduler so I wonder if you might have any idea on this
--
This is an automated
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2061362205
I also checked the log of the scheduler
```
[2024-04-17T08:14:19.661+] {kubernetes_executor.py:749} INFO -
attempting to adopt pod pod-3dbaxxx
tanvn commented on issue #39088:
URL: https://github.com/apache/airflow/issues/39088#issuecomment-2061264796
@potiuk @ephraimbuddy
Hi, do you even have a clue about this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
18 matches
Mail list logo