[GitHub] [airflow] potiuk commented on pull request #20847: Added other Opentelemery Instrumentations
potiuk commented on pull request #20847: URL: https://github.com/apache/airflow/pull/20847#issuecomment-1011887017 Nice. @Melodie97 - while you are at it - maybe - while preparing to the demo next week you could also take a look at the beginning of this: https://github.com/apache/airflow/issues/20778 Which might be a good thing now as we get it working as it will simply be making the changes that you've made more "pluggable" and "optional". I will make some comments in your PR and give you some guidelines for that :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Melodie97 commented on issue #20774: Expand POC with all possible standard metrics available in Open Telemetry
Melodie97 commented on issue #20774: URL: https://github.com/apache/airflow/issues/20774#issuecomment-1011885124 Ok, I'm done with them all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Melodie97 opened a new pull request #20847: Added other Opentelemery Instrumentations
Melodie97 opened a new pull request #20847: URL: https://github.com/apache/airflow/pull/20847 --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call
potiuk commented on a change in pull request #20843: URL: https://github.com/apache/airflow/pull/20843#discussion_r783700863 ## File path: tests/sensors/test_python.py ## @@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self): ds_templated = DEFAULT_DATE.date().isoformat() # 2 calls: first: at start, second: before timeout -assert 2 == len(recorded_calls) +assert 1 <= len(recorded_calls) Review comment: The thing here is that his is the "timing" tests and we test it with "run" method and expect timeout. In perfect world, we could likely use freezegun and simulate the exact time passing, but IMHO it is totally not worth it if we can 100% fix it with this three characters change. Trading test simplicity with perfectness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call
potiuk commented on a change in pull request #20843: URL: https://github.com/apache/airflow/pull/20843#discussion_r783700863 ## File path: tests/sensors/test_python.py ## @@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self): ds_templated = DEFAULT_DATE.date().isoformat() # 2 calls: first: at start, second: before timeout -assert 2 == len(recorded_calls) +assert 1 <= len(recorded_calls) Review comment: The thing here is that his is the "timing" tests and we test it with "run" method and expect timeout. In perfect world, we could likely use freezegun and simulate the exact time passing, but IMHO it is totally not worth it if we can 100% fix it with this one character change. Trading test simplicity with perfectness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call
potiuk commented on a change in pull request #20843: URL: https://github.com/apache/airflow/pull/20843#discussion_r783698559 ## File path: tests/sensors/test_python.py ## @@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self): ds_templated = DEFAULT_DATE.date().isoformat() # 2 calls: first: at start, second: before timeout -assert 2 == len(recorded_calls) +assert 1 <= len(recorded_calls) Review comment: It' sensor and number of calls depends on how fast the whole system is at all (sometimes we will not get the second one if the system is very busy with other parallell tests. We know tht the first call happens for sure (it is immediate), and most of the time the second too. The assert is actually correct IMHO ("expect at least 1 call"). We only check the first call anyway (and this is the gist of it as we really want to see if JINJA template works in the sensor call). If you think on how to fix it better I am all ears :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call
potiuk commented on a change in pull request #20843: URL: https://github.com/apache/airflow/pull/20843#discussion_r783698559 ## File path: tests/sensors/test_python.py ## @@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self): ds_templated = DEFAULT_DATE.date().isoformat() # 2 calls: first: at start, second: before timeout -assert 2 == len(recorded_calls) +assert 1 <= len(recorded_calls) Review comment: It' sensor and number of calls depends on how fast the whole system is at all (sometimes we will not get the second one if the system is very busy with other parallell tests. We know tht the first call happens for sure (it is immediate), and most of the time the second too. The assert is actually correct IMHO ("expect at least 1 call"). We only check the first call anyway (and this is the gist of it as we really want to see if JINJA template works in the sensor call). If you think on how to fix it better I am al ears :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call
potiuk commented on a change in pull request #20843: URL: https://github.com/apache/airflow/pull/20843#discussion_r783698559 ## File path: tests/sensors/test_python.py ## @@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self): ds_templated = DEFAULT_DATE.date().isoformat() # 2 calls: first: at start, second: before timeout -assert 2 == len(recorded_calls) +assert 1 <= len(recorded_calls) Review comment: It' sensor and number of calls depends on how fast the whole system is at all. We know tht the first call happens for sure, and most of the time the second too. The assert is actually correct IMHO ("expect at least 1 call"). We only check the first call anyway (and this is the gist of it as we really want to see if JINJA template works in the sensor call). If you think on how to fix it better I am al ears :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #17279: Logout airflow on Web UI does not work using OAuth2
potiuk commented on issue #17279: URL: https://github.com/apache/airflow/issues/17279#issuecomment-1011872225 cc: @kaxil -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #17279: Logout airflow on Web UI does not work using OAuth2
potiuk edited a comment on issue #17279: URL: https://github.com/apache/airflow/issues/17279#issuecomment-1011871954 Flask App builder constraints are updated already - both main and 2.2 branch. So we could fix it in main and cherry-pick to 2.2.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #17279: Logout airflow on Web UI does not work using OAuth2
potiuk commented on issue #17279: URL: https://github.com/apache/airflow/issues/17279#issuecomment-1011871954 Flask App builder constraints are updated already - both main and 2.2 branch. So we could fix it in main and cherry-pick to 2.3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] blag edited a comment on issue #17279: Logout airflow on Web UI does not work using OAuth2
blag edited a comment on issue #17279: URL: https://github.com/apache/airflow/issues/17279#issuecomment-991370837 Now that dpgaspar/Flask-AppBuilder#1749 is merged, this is one step closer to being fixed. TODO as of now: * [x] Wait for a new release of FAB that includes that fix - [3.4.1 on Dec 13th, 2021](https://github.com/dpgaspar/Flask-AppBuilder/pull/1759) * [x] Update Airflow constraints files * [ ] Check if this bug is fixed in Airflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (8dc68d4 -> c49d6ec)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git. from 8dc68d4 Doc: Added an enum param example (#20841) add c49d6ec static code check doc fix (#20844) No new revisions were added by this update. Summary of changes: STATIC_CODE_CHECKS.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [airflow] potiuk closed issue #20823: Static check docs mistakes in example
potiuk closed issue #20823: URL: https://github.com/apache/airflow/issues/20823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk merged pull request #20844: static code check doc fix
potiuk merged pull request #20844: URL: https://github.com/apache/airflow/pull/20844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Bowrna opened a new pull request #20844: static code check doc fix
Bowrna opened a new pull request #20844: URL: https://github.com/apache/airflow/pull/20844 Static code check docs issue in current breeze environment. closes: https://github.com/apache/airflow/issues/20823 --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow-client-python] feluelle commented on issue #20: get_tasks api is broken
feluelle commented on issue #20: URL: https://github.com/apache/airflow-client-python/issues/20#issuecomment-1011859928 @msumit do you have any idea why this is? I am running into this problem still on the latest version (2.2.0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
potiuk edited a comment on issue #19957: URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056 I actually spent some time few days ago looking at the mini-scheduler code but I could not really find a flaw there. The fact that it did not help you indicates that my hypothesis was unfounded, unfortunately. and maybe the reason was different (and the fact that it worked for @stablum was mainly a coincidence or some side effect of that change). @dwiajik - it might also be that your case is a bit different - could you please report (maybe create a gist with a few examples of) some of the logs of your deadlocks - Ideally if you could send us the logs of failing scheduler and corresponding logs of the Postgres server from the same time - I believe it will be much easier to investigate if we see few examples - and the server logs shoud tell us exactly which two queries deadlocked and this should help us a lot. What we really need is somethiing in hte /var/lib/pgsql/data/pg_log/*.log, there should be entries at the time when then deadlock happens that looks like this: ``` ERROR: deadlock detected DETAIL: Process 21535 waits for AccessExclusiveLock on relation 342640 of database 41454; blocked by process 21506. Process 21506 waits for AccessExclusiveLock on relation 342637 of database 41454; blocked by process 21535. HINT: See server log for query details. CONTEXT: SQL statement "UPDATE ..." We need ideally those and some logs around it if possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
potiuk commented on issue #19957: URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056 I actually spent some time few days ago looking at the mini-scheduler code but I could not really find a flaw there. The fact that it did not help you indicates that my hypothesis was unfounded, unfortunately. and maybe the reason was different (and the fact that it worked for @stablum was mainly a coincidence or some side effect of that change). @dwiajik - it might also be that your case is a bit different - could you please report (maybe create a gist with a few examples of) some of the logs of your deadlocks - Ideally if you could send us the logs of failing scheduler and corresponding logs of the Postgres server from the same time - I believe it will be much easier to investigate if we see few examples - and the server logs shoud tell us exactly which two queries deadlocked and this should help us a lot. What we really need is somethiing in hte /var/lib/pgsql/data/pg_log/*.log, there should be entries at the time when then deadlock happens that looks like this: ``` ERROR: deadlock detected DETAIL: Process 21535 waits for AccessExclusiveLock on relation 342640 of database 41454; blocked by process 21506. Process 21506 waits for AccessExclusiveLock on relation 342637 of database 41454; blocked by process 21535. HINT: See server log for query details. CONTEXT: SQL statement "UPDATE ..." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #20349: Fix Scheduler crash when executing task instances of missing DAG
github-actions[bot] commented on pull request #20349: URL: https://github.com/apache/airflow/pull/20349#issuecomment-1011850115 The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20795: Fix remaining mypy issues in "core" Airflow
uranusjr commented on a change in pull request #20795: URL: https://github.com/apache/airflow/pull/20795#discussion_r783675997 ## File path: airflow/models/taskmixin.py ## @@ -114,6 +115,14 @@ class DAGNode(DependencyMixin, metaclass=ABCMeta): def node_id(self) -> str: raise NotImplementedError() +@property +def label(self) -> Optional[str]: +tg: Optional["TaskGroup"] = getattr(self, 'task_group', None) +if tg and tg.node_id and tg.prefix_group_id: +# "task_group_id.task_id" -> "task_id" +return self.node_id[len(tg.node_id) + 1 :] Review comment: ```suggestion return self.node_id.split(".", 1)[-1] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
potiuk commented on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011849274 https://github.com/apache/airflow/blob/constraints-2.2.3/constraints-3.7.txt#L485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
potiuk commented on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011848901 @NadimYounes -Indeed 2.2.3 constraints were prepared before snowflake was yanked. It should be fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] tag constraints-2.2.3 updated (62d490d -> 8a96274)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to tag constraints-2.2.3 in repository https://gitbox.apache.org/repos/asf/airflow.git. *** WARNING: tag constraints-2.2.3 was modified! *** from 62d490d (commit) to 8a96274 (commit) from 62d490d Revert constraints for yanked `apache-airflow-providers-amazon==2.5.0` add 8a96274 Fix constraints to yanked snowflake release No new revisions were added by this update. Summary of changes: constraints-3.6.txt | 2 +- constraints-3.7.txt | 2 +- constraints-3.8.txt | 2 +- constraints-3.9.txt | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-)
[GitHub] [airflow] uranusjr commented on a change in pull request #20843: Fix flaky templatized call
uranusjr commented on a change in pull request #20843: URL: https://github.com/apache/airflow/pull/20843#discussion_r783674157 ## File path: tests/sensors/test_python.py ## @@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self): ds_templated = DEFAULT_DATE.date().isoformat() # 2 calls: first: at start, second: before timeout -assert 2 == len(recorded_calls) +assert 1 <= len(recorded_calls) Review comment: Why do we have the second call at all? Can we not have it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] 01/01: Fix constraints to yanked snowflake release
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch constraints-2-2-3-fixed in repository https://gitbox.apache.org/repos/asf/airflow.git commit 8a962740270a9ecf6b4f910173321b183e7adf4a Author: Jarek Potiuk AuthorDate: Thu Jan 13 07:53:36 2022 +0100 Fix constraints to yanked snowflake release --- constraints-3.6.txt | 2 +- constraints-3.7.txt | 2 +- constraints-3.8.txt | 2 +- constraints-3.9.txt | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/constraints-3.6.txt b/constraints-3.6.txt index d5c11d4..7a3bc62 100644 --- a/constraints-3.6.txt +++ b/constraints-3.6.txt @@ -485,7 +485,7 @@ snakebite-py3==3.0.5 sniffio==1.2.0 snowballstemmer==2.2.0 snowflake-connector-python==2.7.1 -snowflake-sqlalchemy==1.2.5 +snowflake-sqlalchemy==1.2.4 sortedcontainers==2.4.0 soupsieve==2.3.1 sphinx-airflow-theme==0.0.6 diff --git a/constraints-3.7.txt b/constraints-3.7.txt index 44dfd47..f83f1af 100644 --- a/constraints-3.7.txt +++ b/constraints-3.7.txt @@ -482,7 +482,7 @@ snakebite-py3==3.0.5 sniffio==1.2.0 snowballstemmer==2.2.0 snowflake-connector-python==2.7.1 -snowflake-sqlalchemy==1.2.5 +snowflake-sqlalchemy==1.2.4 sortedcontainers==2.4.0 soupsieve==2.3.1 sphinx-airflow-theme==0.0.6 diff --git a/constraints-3.8.txt b/constraints-3.8.txt index 67e8d5f..eab89b4 100644 --- a/constraints-3.8.txt +++ b/constraints-3.8.txt @@ -480,7 +480,7 @@ snakebite-py3==3.0.5 sniffio==1.2.0 snowballstemmer==2.2.0 snowflake-connector-python==2.7.1 -snowflake-sqlalchemy==1.2.5 +snowflake-sqlalchemy==1.2.4 sortedcontainers==2.4.0 soupsieve==2.3.1 sphinx-airflow-theme==0.0.6 diff --git a/constraints-3.9.txt b/constraints-3.9.txt index aba077a..ea90cad 100644 --- a/constraints-3.9.txt +++ b/constraints-3.9.txt @@ -475,7 +475,7 @@ snakebite-py3==3.0.5 sniffio==1.2.0 snowballstemmer==2.2.0 snowflake-connector-python==2.7.1 -snowflake-sqlalchemy==1.2.5 +snowflake-sqlalchemy==1.2.4 sortedcontainers==2.4.0 soupsieve==2.3.1 sphinx-airflow-theme==0.0.6
[airflow] branch constraints-2-2-3-fixed created (now 8a96274)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch constraints-2-2-3-fixed in repository https://gitbox.apache.org/repos/asf/airflow.git. at 8a96274 Fix constraints to yanked snowflake release This branch includes the following new commits: new 8a96274 Fix constraints to yanked snowflake release The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[GitHub] [airflow] uranusjr commented on a change in pull request #20843: Fix flaky templatized call
uranusjr commented on a change in pull request #20843: URL: https://github.com/apache/airflow/pull/20843#discussion_r783673639 ## File path: tests/sensors/test_python.py ## @@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self): ds_templated = DEFAULT_DATE.date().isoformat() # 2 calls: first: at start, second: before timeout -assert 2 == len(recorded_calls) +assert 1 <= len(recorded_calls) Review comment: This seems to defeat the purpose of this assertion. Why is the second call not recorded? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
uranusjr commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783672138 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob + +import logging + +from sqlalchemy import and_, func + +from airflow.models import TaskReschedule +from airflow.utils import timezone + +now = timezone.utcnow + + +objects = { Review comment: These structured data should use a more structured type than dict-of-dicts. A dict of namedtuples instead would be a big improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
uranusjr commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783671395 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob + +import logging + +from sqlalchemy import and_, func + +from airflow.models import TaskReschedule +from airflow.utils import timezone + +now = timezone.utcnow + + +objects = { +'job': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': BaseJob, +'recency_column': BaseJob.latest_heartbeat, +}, +'dag': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': DagModel, +'recency_column': DagModel.last_parsed_time, +}, +'dag_run': { +'keep_last': True, +"keep_last_filters": [DagRun.external_trigger.is_(False)], +"keep_last_group_by": DagRun.dag_id, +'orm_model': DagRun, +'recency_column': DagRun.execution_date, +}, +'import_error': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': ImportError, +'recency_column': ImportError.timestamp, +}, +'log': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': Log, +'recency_column': Log.dttm, +}, +'rendered_task_instance_fields': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': RenderedTaskInstanceFields, +'recency_column': RenderedTaskInstanceFields.execution_date, +}, +'sla_miss': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': SlaMiss, +'recency_column': SlaMiss.execution_date, +}, +'task_fail': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskFail, +'recency_column': TaskFail.execution_date, +}, +'task_instance': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskInstance, +'recency_column': TaskInstance.execution_date, Review comment: This should work, there are existing code doing it. The association proxy does not always work like real columns, but does work for this specific usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a change in pull request #20349: Fix Scheduler crash when executing task instances of missing DAG
ephraimbuddy commented on a change in pull request #20349: URL: https://github.com/apache/airflow/pull/20349#discussion_r783670984 ## File path: airflow/jobs/scheduler_job.py ## @@ -403,6 +403,15 @@ def _executable_task_instances_to_queued(self, max_tis: int, session: Session = # Many dags don't have a task_concurrency, so where we can avoid loading the full # serialized DAG the better. serialized_dag = self.dagbag.get_dag(dag_id, session=session) +# If the dag is missing, fail the task and continue to the next task. +if not serialized_dag: +self.log.error( +"DAG '%s' for task instance %s not found in serialized_dag table", +dag_id, +task_instance, +) +task_instance.set_state(State.FAILED, session=session) Review comment: I will opt for setting all scheduled tasks to None. I doubt that failing the DAG here will really fail it when there're tasks being executed in executor. There's a bug that when you mark a DAG as failed, it comes up again as running https://github.com/apache/airflow/issues/16078 So I propose to set all scheduled tasks to None. The scheduler will no longer move the task instances to scheduled when the dag can no longer be found and it makes sense to set it to None instead of failing it since the task instances won't have logs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (83b1e36 -> 8dc68d4)
This is an automated email from the ASF dual-hosted git repository. kaxilnaik pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git. from 83b1e36 Speedup liveness probe for scheduler and triggerer (#20833) add 8dc68d4 Doc: Added an enum param example (#20841) No new revisions were added by this update. Summary of changes: docs/apache-airflow/concepts/params.rst | 3 +++ 1 file changed, 3 insertions(+)
[GitHub] [airflow] kaxil merged pull request #20841: added an enum param example
kaxil merged pull request #20841: URL: https://github.com/apache/airflow/pull/20841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #20795: Fix remaining mypy issues in "core" Airflow
potiuk commented on pull request #20795: URL: https://github.com/apache/airflow/pull/20795#issuecomment-1011843784 Flay test fixed in https://github.com/apache/airflow/pull/20843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #20843: Fix flaky templatized call
potiuk commented on pull request #20843: URL: https://github.com/apache/airflow/pull/20843#issuecomment-1011843612 Fixes flaky test (example in #20795) https://github.com/apache/airflow/runs/4793994282?check_suite_focus=true -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
uranusjr commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783669005 ## File path: airflow/cli/commands/maintenance_command.py ## @@ -0,0 +1,31 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""Maintenance sub-commands""" +from airflow.utils.metastore_cleanup import run_cleanup + + +def cleanup(args): +"""Purges old records in metastore database""" +print(args.dry_run) +kwargs = dict( +dry_run=args.dry_run, clean_before_timestamp=args.clean_before_timestamp, verbose=args.verbose +) +if args.tables: +kwargs.update( +table_names=args.tables, +) +run_cleanup(**kwargs) Review comment: Since you do `if table_name` in `run_cleanup`, this is equivalent to ```python run_cleanup( dry_run=args.dry_run, clean_before_timestamp=args.clean_before_timestamp, verbose=args.verbose, table_names=args.tables, ) ``` With appropriate docstring in `run_cleanup` (to clarify that passing an empty list is the same as not passing the argument), this would be more readable and maintainable that the current implementation IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk opened a new pull request #20843: Fix flaky templatized call
potiuk opened a new pull request #20843: URL: https://github.com/apache/airflow/pull/20843 --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #19769: Handle stuck queued tasks in Celery
kaxil commented on a change in pull request #19769: URL: https://github.com/apache/airflow/pull/19769#discussion_r783667980 ## File path: airflow/executors/celery_executor.py ## @@ -377,6 +385,49 @@ def _check_for_stalled_adopted_tasks(self): for key in timedout_keys: self.change_state(key, State.FAILED) +@provide_session +def _clear_stuck_queued_tasks(self, session: Session = NEW_SESSION) -> None: +""" +Tasks can get stuck in queued state in DB while still not in +worker. This happens when the worker is autoscaled down and +the task is queued but has not been picked up by any worker prior to the scaling. + +In such situation, we update the task instance state to scheduled so that +it can be queued again. We chose to use task_adoption_timeout to decide +""" +if not isinstance(app.backend, DatabaseBackend): +# We only want to do this for database backends where Review comment: Oh yes, atleast in that case, the comment should be clear with a proper TODO or just solved in this PR too for all the other backends. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a change in pull request #19769: Handle stuck queued tasks in Celery
ephraimbuddy commented on a change in pull request #19769: URL: https://github.com/apache/airflow/pull/19769#discussion_r783662283 ## File path: airflow/executors/celery_executor.py ## @@ -377,6 +385,49 @@ def _check_for_stalled_adopted_tasks(self): for key in timedout_keys: self.change_state(key, State.FAILED) +@provide_session +def _clear_stuck_queued_tasks(self, session: Session = NEW_SESSION) -> None: +""" +Tasks can get stuck in queued state in DB while still not in +worker. This happens when the worker is autoscaled down and +the task is queued but has not been picked up by any worker prior to the scaling. + +In such situation, we update the task instance state to scheduled so that +it can be queued again. We chose to use task_adoption_timeout to decide +""" +if not isinstance(app.backend, DatabaseBackend): +# We only want to do this for database backends where Review comment: I can't figure how it could be solved in Redis. Do you have a reproductive step for Redis just like in #19699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #18575: Resync DAG during next parse if error in ``sync_to_db``
uranusjr commented on a change in pull request #18575: URL: https://github.com/apache/airflow/pull/18575#discussion_r783661606 ## File path: airflow/models/dagbag.py ## @@ -584,22 +584,35 @@ def _serialize_dag_capturing_errors(dag, session): We can't place them directly in import_errors, as this may be retried, and work the next time """ -if dag.is_subdag: -return [] try: # We can't use bulk_write_to_db as we want to capture each error individually dag_was_updated = SerializedDagModel.write_dag( dag, min_update_interval=settings.MIN_SERIALIZED_DAG_UPDATE_INTERVAL, session=session, ) -if dag_was_updated: -self._sync_perm_for_dag(dag, session=session) +return dag_was_updated, [] Review comment: Probably more readable to put this in an `else` block after all the exception handlers? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #20841: added an enum param example
github-actions[bot] commented on pull request #20841: URL: https://github.com/apache/airflow/pull/20841#issuecomment-1011831582 The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20286: Add TaskMap and TaskInstance.map_id
uranusjr commented on a change in pull request #20286: URL: https://github.com/apache/airflow/pull/20286#discussion_r783658006 ## File path: airflow/models/taskinstance.py ## @@ -2128,6 +2138,14 @@ def set_duration(self) -> None: self.duration = None self.log.debug("Task Duration set to %s", self.duration) +def _record_task_map_for_downstreams(self, value: Any, *, session: Session) -> None: +if not self.task.has_mapped_dependants(): +return +if not isinstance(value, collections.abc.Collection) or isinstance(value, (bytes, str)): +self.log.info("Failing %s for unmappable XCom push %r", self.key, value) +raise UnmappableXComPushed(value) Review comment: Changed both logging and exception to only include the variable type instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20286: Add TaskMap and TaskInstance.map_id
uranusjr commented on a change in pull request #20286: URL: https://github.com/apache/airflow/pull/20286#discussion_r783657603 ## File path: airflow/models/baseoperator.py ## @@ -1632,6 +1632,33 @@ def defer( def map(self, **kwargs) -> "MappedOperator": return MappedOperator.from_operator(self, kwargs) +def has_mapped_dependants(self) -> bool: +"""Whether any downstream dependencies depend on this task for mapping.""" +from airflow.utils.task_group import MappedTaskGroup, TaskGroup + +if not self.has_dag(): +return False + +def _walk_group(group: TaskGroup) -> Iterable[Tuple[str, DAGNode]]: +"""Recursively walk children in a task group. + +This yields all direct children (including both tasks and task +groups), and all children of any task groups. +""" +for key, child in group.children.items(): +yield key, child +if isinstance(child, TaskGroup): +yield from _walk_group(child) + +for key, child in _walk_group(self.dag.task_group): +if key == self.task_id: +continue +if not isinstance(child, (MappedOperator, MappedTaskGroup)): +continue +if self.task_id in child.upstream_task_ids: +return True +return False Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475121#comment-17475121 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. And then, the scheduler will checkout this and report "task instance X finished (success) although the task says its queued. Was the task killed externally?" this is a simple schematic diagram: ![image](https://user-images.githubusercontent.com/8371330/149273573-45700f32-079b-4b22-8dba-d6a1ce37a243.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [airflow] ghostbody edited a comment on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the
ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. And then, the scheduler will checkout this and report "task instance X finished (success) although the task says its queued. Was the task killed externally?" this is a simple schematic diagram: ![image](https://user-images.githubusercontent.com/8371330/149273573-45700f32-079b-4b22-8dba-d6a1ce37a243.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475117#comment-17475117 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. And then, the scheduler will checkout this and report "task instance X finished (success) although the task says its queued. Was the task killed externally?" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [airflow] ghostbody edited a comment on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the
ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. And then, the scheduler will checkout this and report "task instance X finished (success) although the task says its queued. Was the task killed externally?" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475116#comment-17475116 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [airflow] ghostbody edited a comment on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the
ghostbody edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. In this senario, the local task will kill the child process by mistake. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475115#comment-17475115 ] ASF GitHub Bot commented on AIRFLOW-5071: - ghostbody commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > -- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler >Affects Versions: 1.10.3 >Reporter: msempere >Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [airflow] ghostbody commented on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the task ki
ghostbody commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804 we reviewed the code and found that in `local_task_job.py`, the parent process has a `heatbeat_callback`, and will check the state and child process return code of the `task_instance`. However, theses lines may cover a bug? ![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png) ![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png) **The raw task command write back the taskintance's state(like sucess) doesn't mean the child process is finished(returned)?** So, in this heatbeat callback, there maybe a race condition when task state is filled back while the child process is not returned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService opened a new pull request #20841: added an enum param example
MatrixManAtYrService opened a new pull request #20841: URL: https://github.com/apache/airflow/pull/20841 More examples makes it easier to compare our docs with the json-schema docs and figure out how they work together (I tested the code shown here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20743: Serialize mapped tasks and task groups
uranusjr commented on a change in pull request #20743: URL: https://github.com/apache/airflow/pull/20743#discussion_r783623735 ## File path: airflow/models/baseoperator.py ## @@ -207,6 +207,7 @@ def apply_defaults(self: "BaseOperator", *args: Any, **kwargs: Any) -> Any: result = func(self, **kwargs, default_args=default_args) # Store the args passed to init -- we need them to support task.map serialzation! +kwargs.pop('task_id', None) Review comment: How about storing `task_id` separately instead and not in `partial_kwargs`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] nirutgupta edited a comment on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI
nirutgupta edited a comment on issue #20779: URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011786738 k8s task handler.py which we can use https://gist.github.com/szeevs/938ad3cf96e732d4b1b55a74015aed5b log_config.py will look like ``` from copy import deepcopy from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG from airflow.configuration import conf import os import sys sys.path.append('/opt/airflow/custom_log') BASE_LOG_FOLDER: str = conf.get('logging', 'BASE_LOG_FOLDER') FILENAME_TEMPLATE: str = conf.get('logging', 'LOG_FILENAME_TEMPLATE') LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG) LOGGING_CONFIG['handlers']['k8stask'] = { 'class': 'k8s_task_handler.KubernetesTaskHandler', 'formatter': 'airflow', 'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER), 'filename_template': FILENAME_TEMPLATE } LOGGING_CONFIG['loggers']['airflow.task']['handlers'].append('k8stask') ``` Make sure while deploying, configure this env - name: AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS value: "log_config.LOGGING_CONFIG" This will do the job. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] nirutgupta edited a comment on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI
nirutgupta edited a comment on issue #20779: URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011786738 k8s task handler.py which we can use https://gist.github.com/szeevs/938ad3cf96e732d4b1b55a74015aed5b log_config.py will look like ``` from copy import deepcopy from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG from airflow.configuration import conf import os import sys sys.path.append('/opt/airflow/custom_log') BASE_LOG_FOLDER: str = conf.get('logging', 'BASE_LOG_FOLDER') FILENAME_TEMPLATE: str = conf.get('logging', 'LOG_FILENAME_TEMPLATE') LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG) LOGGING_CONFIG['handlers']['k8stask'] = { 'class': 'k8s_task_handler.KubernetesTaskHandler', 'formatter': 'airflow', 'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER), 'filename_template': FILENAME_TEMPLATE } LOGGING_CONFIG['loggers']['airflow.task']['handlers'].append('k8stask') ``` Make sure while deploying configure this env - name: AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS value: "log_config.LOGGING_CONFIG" This will do the job. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] nirutgupta commented on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI
nirutgupta commented on issue #20779: URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011786738 k8s task handler.py which we can use https://gist.github.com/szeevs/938ad3cf96e732d4b1b55a74015aed5b log_config.py will look like `from copy import deepcopy from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG from airflow.configuration import conf import os import sys sys.path.append('/opt/airflow/custom_log') BASE_LOG_FOLDER: str = conf.get('logging', 'BASE_LOG_FOLDER') FILENAME_TEMPLATE: str = conf.get('logging', 'LOG_FILENAME_TEMPLATE') LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG) LOGGING_CONFIG['handlers']['k8stask'] = { 'class': 'k8s_task_handler.KubernetesTaskHandler', 'formatter': 'airflow', 'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER), 'filename_template': FILENAME_TEMPLATE } LOGGING_CONFIG['loggers']['airflow.task']['handlers'].append('k8stask') ` Make sure while deploying configure this env - name: AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS value: "log_config.LOGGING_CONFIG" This will do the job. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] nirutgupta commented on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI
nirutgupta commented on issue #20779: URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011785432 I see a good working solution of this with https://szeevs.medium.com/handling-airflow-logs-with-kubernetes-executor-25c11ea831e4 All we need to do is make this k8s_task_handler file available under utils/log/ and configure log_config.py file to link airflow.tasks to this new handler as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20349: Fix Scheduler crash when executing task instances of missing DAG
uranusjr commented on a change in pull request #20349: URL: https://github.com/apache/airflow/pull/20349#discussion_r783617068 ## File path: tests/jobs/test_scheduler_job.py ## @@ -645,6 +645,36 @@ def test_find_executable_task_instances_in_default_pool(self, dag_maker): session.rollback() session.close() +def test_executable_task_instances_to_queued_fails_task_for_missing_dag_in_dagbag( +self, dag_maker, session +): +"""Check that task instances of missing DAGs are failed""" +dag_id = 'SchedulerJobTest.test_find_executable_task_instances_not_in_dagbag' +task_id_1 = 'dummy' +task_id_2 = 'dummydummy' + +with dag_maker(dag_id=dag_id, session=session, default_args={"max_active_tis_per_dag": 1}): +DummyOperator(task_id=task_id_1) +DummyOperator(task_id=task_id_2) + +self.scheduler_job = SchedulerJob(subdir=os.devnull) +self.scheduler_job.dagbag = mock.MagicMock() +self.scheduler_job.dagbag.get_dag.return_value = None + +dr = dag_maker.create_dagrun(state=DagRunState.RUNNING) + +tis = dr.task_instances +for ti in tis: +ti.state = State.SCHEDULED +session.merge(ti) +session.flush() +res = self.scheduler_job._executable_task_instances_to_queued(max_tis=32, session=session) +session.flush() +assert 0 == len(res) +tis = dr.get_task_instances(session=session) +for ti in tis: +assert ti.state == State.FAILED Review comment: Or ```python assert [TaskInstanceState.FAILED, TaskInstanceState.FAILED] == [ti.state for ti in tis] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a change in pull request #20349: Fix Scheduler crash when executing task instances of missing DAG
uranusjr commented on a change in pull request #20349: URL: https://github.com/apache/airflow/pull/20349#discussion_r783616795 ## File path: airflow/jobs/scheduler_job.py ## @@ -403,6 +403,15 @@ def _executable_task_instances_to_queued(self, max_tis: int, session: Session = # Many dags don't have a task_concurrency, so where we can avoid loading the full # serialized DAG the better. serialized_dag = self.dagbag.get_dag(dag_id, session=session) +# If the dag is missing, fail the task and continue to the next task. +if not serialized_dag: +self.log.error( +"DAG '%s' for task instance %s not found in serialized_dag table", +dag_id, +task_instance, +) +task_instance.set_state(State.FAILED, session=session) Review comment: > there could be downstream tasks which still attempt to execute, which will then be marked failed by the same check I think this is a good thing in this case. This code is reached because the DAG declaring those tasks is gone, so it doesn’t make sense to execute those tasks IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch constraints-main updated: Updating constraints. Build id:1690466917
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new 32bc5e2 Updating constraints. Build id:1690466917 32bc5e2 is described below commit 32bc5e2d4a8f7b6747e4253577429664dc4e7162 Author: Automated GitHub Actions commit AuthorDate: Thu Jan 13 01:54:30 2022 + Updating constraints. Build id:1690466917 This update in constraints is automatically committed by the CI 'constraints-push' step based on HEAD of 'refs/heads/main' in 'apache/airflow' with commit sha 83b1e364eb3949288161035e38d8a96329579e52. All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.7.txt | 2 +- constraints-3.8.txt | 2 +- constraints-3.9.txt | 2 +- constraints-no-providers-3.7.txt | 2 +- constraints-no-providers-3.8.txt | 2 +- constraints-no-providers-3.9.txt | 2 +- constraints-source-providers-3.7.txt | 6 +++--- constraints-source-providers-3.8.txt | 4 ++-- constraints-source-providers-3.9.txt | 6 +++--- 9 files changed, 14 insertions(+), 14 deletions(-) diff --git a/constraints-3.7.txt b/constraints-3.7.txt index 61ef132..523dcd3 100644 --- a/constraints-3.7.txt +++ b/constraints-3.7.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:32Z +# This constraints file was automatically generated on 2022-01-13T01:51:09Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.8.txt b/constraints-3.8.txt index 0f54717..343ac3a 100644 --- a/constraints-3.8.txt +++ b/constraints-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:26Z +# This constraints file was automatically generated on 2022-01-13T01:51:07Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.9.txt b/constraints-3.9.txt index ef99351..ca55fdd 100644 --- a/constraints-3.9.txt +++ b/constraints-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:26Z +# This constraints file was automatically generated on 2022-01-13T01:51:06Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-no-providers-3.7.txt b/constraints-no-providers-3.7.txt index db3daaa..9b28766 100644 --- a/constraints-no-providers-3.7.txt +++ b/constraints-no-providers-3.7.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:53Z +# This constraints file was automatically generated on 2022-01-13T01:54:24Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.8.txt b/constraints-no-providers-3.8.txt index ce0943e..1d4d314 100644 --- a/constraints-no-providers-3.8.txt +++ b/constraints-no-providers-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:50Z +# This constraints file was automatically generated on 2022-01-13T01:54:23Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.9.txt b/constraints-no-providers-3.9.txt index e4c012e..ce1e6e6 100644 --- a/constraints-no-providers-3.9.txt +++ b/constraints-no-providers-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:58Z +# This constraints file was automatically generated on 2022-01-13T01:54:25Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git
[airflow] branch main updated (fe5aba2 -> 83b1e36)
This is an automated email from the ASF dual-hosted git repository. dstandish pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git. from fe5aba2 Fix grammatical error in Variable.get docstring (#20837) add 83b1e36 Speedup liveness probe for scheduler and triggerer (#20833) No new revisions were added by this update. Summary of changes: chart/templates/scheduler/scheduler-deployment.yaml | 1 + chart/templates/triggerer/triggerer-deployment.yaml | 1 + 2 files changed, 2 insertions(+)
[GitHub] [airflow] dstandish commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
dstandish commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783602645 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob Review comment: thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
dstandish commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783602610 ## File path: airflow/cli/cli_parser.py ## @@ -378,6 +383,27 @@ def _check(value): ARG_CONF = Arg(('-c', '--conf'), help="JSON string that gets pickled into the DagRun's conf attribute") ARG_EXEC_DATE = Arg(("-e", "--exec-date"), help="The execution date of the DAG", type=parsedate) +# maintenance +ARG_MAINTENANCE_TABLES = Arg( +("-t", "--tables"), +help="Table names to perform maintenance on (use comma-separated list)", Review comment: good idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
dstandish commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783602548 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob + +import logging + +from sqlalchemy import and_, func + +from airflow.models import TaskReschedule +from airflow.utils import timezone + +now = timezone.utcnow + + +objects = { +'job': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': BaseJob, +'recency_column': BaseJob.latest_heartbeat, +}, +'dag': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': DagModel, +'recency_column': DagModel.last_parsed_time, +}, +'dag_run': { +'keep_last': True, +"keep_last_filters": [DagRun.external_trigger.is_(False)], +"keep_last_group_by": DagRun.dag_id, +'orm_model': DagRun, +'recency_column': DagRun.execution_date, +}, +'import_error': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': ImportError, +'recency_column': ImportError.timestamp, +}, +'log': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': Log, +'recency_column': Log.dttm, +}, +'rendered_task_instance_fields': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': RenderedTaskInstanceFields, +'recency_column': RenderedTaskInstanceFields.execution_date, +}, +'sla_miss': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': SlaMiss, +'recency_column': SlaMiss.execution_date, +}, +'task_fail': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskFail, +'recency_column': TaskFail.execution_date, +}, +'task_instance': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskInstance, +'recency_column': TaskInstance.execution_date, Review comment: tell me about the unsureness? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch constraints-main updated: Updating constraints. Build id:1690466917
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new 32bc5e2 Updating constraints. Build id:1690466917 32bc5e2 is described below commit 32bc5e2d4a8f7b6747e4253577429664dc4e7162 Author: Automated GitHub Actions commit AuthorDate: Thu Jan 13 01:54:30 2022 + Updating constraints. Build id:1690466917 This update in constraints is automatically committed by the CI 'constraints-push' step based on HEAD of 'refs/heads/main' in 'apache/airflow' with commit sha 83b1e364eb3949288161035e38d8a96329579e52. All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.7.txt | 2 +- constraints-3.8.txt | 2 +- constraints-3.9.txt | 2 +- constraints-no-providers-3.7.txt | 2 +- constraints-no-providers-3.8.txt | 2 +- constraints-no-providers-3.9.txt | 2 +- constraints-source-providers-3.7.txt | 6 +++--- constraints-source-providers-3.8.txt | 4 ++-- constraints-source-providers-3.9.txt | 6 +++--- 9 files changed, 14 insertions(+), 14 deletions(-) diff --git a/constraints-3.7.txt b/constraints-3.7.txt index 61ef132..523dcd3 100644 --- a/constraints-3.7.txt +++ b/constraints-3.7.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:32Z +# This constraints file was automatically generated on 2022-01-13T01:51:09Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.8.txt b/constraints-3.8.txt index 0f54717..343ac3a 100644 --- a/constraints-3.8.txt +++ b/constraints-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:26Z +# This constraints file was automatically generated on 2022-01-13T01:51:07Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.9.txt b/constraints-3.9.txt index ef99351..ca55fdd 100644 --- a/constraints-3.9.txt +++ b/constraints-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:26Z +# This constraints file was automatically generated on 2022-01-13T01:51:06Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-no-providers-3.7.txt b/constraints-no-providers-3.7.txt index db3daaa..9b28766 100644 --- a/constraints-no-providers-3.7.txt +++ b/constraints-no-providers-3.7.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:53Z +# This constraints file was automatically generated on 2022-01-13T01:54:24Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.8.txt b/constraints-no-providers-3.8.txt index ce0943e..1d4d314 100644 --- a/constraints-no-providers-3.8.txt +++ b/constraints-no-providers-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:50Z +# This constraints file was automatically generated on 2022-01-13T01:54:23Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.9.txt b/constraints-no-providers-3.9.txt index e4c012e..ce1e6e6 100644 --- a/constraints-no-providers-3.9.txt +++ b/constraints-no-providers-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:58Z +# This constraints file was automatically generated on 2022-01-13T01:54:25Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git
[airflow] branch main updated (fe5aba2 -> 83b1e36)
This is an automated email from the ASF dual-hosted git repository. dstandish pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git. from fe5aba2 Fix grammatical error in Variable.get docstring (#20837) add 83b1e36 Speedup liveness probe for scheduler and triggerer (#20833) No new revisions were added by this update. Summary of changes: chart/templates/scheduler/scheduler-deployment.yaml | 1 + chart/templates/triggerer/triggerer-deployment.yaml | 1 + 2 files changed, 2 insertions(+)
[airflow] branch constraints-main updated: Updating constraints. Build id:1690466917
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new 32bc5e2 Updating constraints. Build id:1690466917 32bc5e2 is described below commit 32bc5e2d4a8f7b6747e4253577429664dc4e7162 Author: Automated GitHub Actions commit AuthorDate: Thu Jan 13 01:54:30 2022 + Updating constraints. Build id:1690466917 This update in constraints is automatically committed by the CI 'constraints-push' step based on HEAD of 'refs/heads/main' in 'apache/airflow' with commit sha 83b1e364eb3949288161035e38d8a96329579e52. All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.7.txt | 2 +- constraints-3.8.txt | 2 +- constraints-3.9.txt | 2 +- constraints-no-providers-3.7.txt | 2 +- constraints-no-providers-3.8.txt | 2 +- constraints-no-providers-3.9.txt | 2 +- constraints-source-providers-3.7.txt | 6 +++--- constraints-source-providers-3.8.txt | 4 ++-- constraints-source-providers-3.9.txt | 6 +++--- 9 files changed, 14 insertions(+), 14 deletions(-) diff --git a/constraints-3.7.txt b/constraints-3.7.txt index 61ef132..523dcd3 100644 --- a/constraints-3.7.txt +++ b/constraints-3.7.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:32Z +# This constraints file was automatically generated on 2022-01-13T01:51:09Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.8.txt b/constraints-3.8.txt index 0f54717..343ac3a 100644 --- a/constraints-3.8.txt +++ b/constraints-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:26Z +# This constraints file was automatically generated on 2022-01-13T01:51:07Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.9.txt b/constraints-3.9.txt index ef99351..ca55fdd 100644 --- a/constraints-3.9.txt +++ b/constraints-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:29:26Z +# This constraints file was automatically generated on 2022-01-13T01:51:06Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-no-providers-3.7.txt b/constraints-no-providers-3.7.txt index db3daaa..9b28766 100644 --- a/constraints-no-providers-3.7.txt +++ b/constraints-no-providers-3.7.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:53Z +# This constraints file was automatically generated on 2022-01-13T01:54:24Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.8.txt b/constraints-no-providers-3.8.txt index ce0943e..1d4d314 100644 --- a/constraints-no-providers-3.8.txt +++ b/constraints-no-providers-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:50Z +# This constraints file was automatically generated on 2022-01-13T01:54:23Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.9.txt b/constraints-no-providers-3.9.txt index e4c012e..ce1e6e6 100644 --- a/constraints-no-providers-3.9.txt +++ b/constraints-no-providers-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2022-01-12T22:33:58Z +# This constraints file was automatically generated on 2022-01-13T01:54:25Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git
[airflow] branch main updated (fe5aba2 -> 83b1e36)
This is an automated email from the ASF dual-hosted git repository. dstandish pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git. from fe5aba2 Fix grammatical error in Variable.get docstring (#20837) add 83b1e36 Speedup liveness probe for scheduler and triggerer (#20833) No new revisions were added by this update. Summary of changes: chart/templates/scheduler/scheduler-deployment.yaml | 1 + chart/templates/triggerer/triggerer-deployment.yaml | 1 + 2 files changed, 2 insertions(+)
[GitHub] [airflow] dwiajik edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
dwiajik edited a comment on issue #19957: URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dwiajik commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
dwiajik commented on issue #19957: URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020 I am experiencing the same problem and have set `schedule_after_task_execution` to `False`. The issue still persist. Do you have any suggestion? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jedcunningham opened a new pull request #18575: Resync DAG during next parse if error in ``sync_to_db``
jedcunningham opened a new pull request #18575: URL: https://github.com/apache/airflow/pull/18575 If there is an issue syncing DAG specific permissions, force syncing to happen again the next time the DAG is parsed so errors will be added to ``import_errors`` and shown in the UI. The easiest example of this is the use of a non-existent role in ``access_control``. Related #17166 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (fe5aba2 -> 83b1e36)
This is an automated email from the ASF dual-hosted git repository. dstandish pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git. from fe5aba2 Fix grammatical error in Variable.get docstring (#20837) add 83b1e36 Speedup liveness probe for scheduler and triggerer (#20833) No new revisions were added by this update. Summary of changes: chart/templates/scheduler/scheduler-deployment.yaml | 1 + chart/templates/triggerer/triggerer-deployment.yaml | 1 + 2 files changed, 2 insertions(+)
[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes edited a comment on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448 @mik-laj Not sure I am following. If I try installing airflow with the snowflake provider using the command below: ``` pip install apache-airflow[snowflake]==2.2.3 -c https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt ``` It still results in the known issues with SqlAlchemy imports. Am I missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes edited a comment on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448 @mik-laj Not sure I am following. If I try installing airflow with the snowflake provider using the command below: ``` pip install `apache-airflow[snowflake]==2.2.3` -c `https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt` ``` It still results in the known issues with SqlAlchemy imports. Am I missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes edited a comment on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448 @mik-laj Not sure I am following. If I try installing airflow with the snowflake provider using the command below: pip install `apache-airflow[snowflake]==2.2.3` -c `https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt` It still results in the known issues with SqlAlchemy imports. Am I missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes edited a comment on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448 @mik-laj Not sure I am following. If I try installing airflow with the snowflake provider using the command below: pip install `apache-airflow[snowflake]` -c `https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt` It still results in the known issues with SqlAlchemy imports. Am I missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes edited a comment on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448 @mik-laj Not sure I am following. If I try installing airflow with the snowflake provider using the command below: pip install `apache-airflow[snowflake]` -c `https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt` It still results in the known issues with SqlAlchemy imports. Am I missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes commented on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448 @mik-laj Not sure I am following. If I try installing pip install `apache-airflow[snowflake]` -c `https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt` it will cause the known issues with SqlAlchemy imports. Am I missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] sergxm2 edited a comment on issue #19222: none_failed_min_one_success trigger rule not working with BranchPythonOperator in certain cases.
sergxm2 edited a comment on issue #19222: URL: https://github.com/apache/airflow/issues/19222#issuecomment-1011610462 Seeing the same issue with BranchPythonOperator / branching and the final task (i.e. task6) being incorrectly skipped instead of being called. This is observed in 2.2.x but not in 2.1.x and not in 2.0.x To be specific, this is unrelated to returning an "empty" task ID, as we're seeing this happen even when the task ID is returned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #20839: Cannot edit custom fields on provider connections
boring-cyborg[bot] commented on issue #20839: URL: https://github.com/apache/airflow/issues/20839#issuecomment-1011613591 Thanks for opening your first issue here! Be sure to follow the issue template! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mike-mcdonald opened a new issue #20839: Cannot edit custom fields on provider connections
mike-mcdonald opened a new issue #20839: URL: https://github.com/apache/airflow/issues/20839 ### Apache Airflow version 2.2.3 (latest released) ### What happened Connections from providers are not saving edited values in any custom connection forms. You can work around the issue by changing connection type to something like HTTP, and modifying the extra field's JSON. ### What you expected to happen _No response_ ### How to reproduce Using the official docker compose deployment, add a new connection of type 'Azure Data Explorer' and fill out one of the custom connection fields (e.g., "Tenant ID"). Save the record. Edit the record and enter a new value for the same field or any other field that is defined in the associated Hook's `get_connection_form_widgets` function. Save the record. Edit again. The changes were not saved. ### Operating System Windows 10, using docker ### Versions of Apache Airflow Providers _No response_ ### Deployment Docker-Compose ### Deployment details _No response_ ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] sergxm2 commented on issue #19222: none_failed_min_one_success trigger rule not working with BranchPythonOperator in certain cases.
sergxm2 commented on issue #19222: URL: https://github.com/apache/airflow/issues/19222#issuecomment-1011610462 Seeing the same issue with BranchPythonOperator / branching and the final task (i.e. task6) being incorrectly skipped instead of being called. This is observed in 2.2.x but not in 2.1.x and not in 2.0.x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish merged pull request #20833: Speedup liveness probe for scheduler and triggerer
dstandish merged pull request #20833: URL: https://github.com/apache/airflow/pull/20833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
mik-laj commented on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011599004 @NadimYounes This release has been yanked, so it shouldn't be installed automatically anymore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] danmactough commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing
danmactough commented on issue #13542: URL: https://github.com/apache/airflow/issues/13542#issuecomment-1011598836 Airflow 2.0.2+e494306fb01f3a026e7e2832ca94902e96b526fa (MWAA on AWS) This happens to us a LOT: a DAG will be running, task instances will be marked as "queued", but nothing gets moved to "running". When this happened today (the first time today), I was able to track down the following error in the scheduler logs: ![2022-01-12 at 7 16 PM](https://user-images.githubusercontent.com/357481/149243393-0f0b5b91-d1f7-4a51-8a43-3eab644a49e7.png) At some point after the scheduler had that exception, I tried to clear the state of the queued task instances to get them to run. That resulting in the following logs: ![2022-01-12 at 7 18 PM](https://user-images.githubusercontent.com/357481/149243535-3ebfd0b1-31af-43aa-99e2-7ee5aa1dbaff.png) This corresponds to this [section of code](https://github.com/apache/airflow/blob/2.0.2/airflow/executors/base_executor.py#L73-L85): ![2022-01-12 at 10 38 AM](https://user-images.githubusercontent.com/357481/149171972-e9824366-6e85-4c2e-a00c-5ee66d466de8.png) My conclusion is that when the scheduler experienced that error, it entered a pathological state: it was running but had bad state in memory. Specifically, the queued task instances were in the `queued_tasks` or `running` in-memory cache, and thus any attempts to re-queue those tasks would fail as long as that scheduler process was running because the tasks would appear to already have been queued and/or running. Both caches use the [`TaskInstanceKey`](https://github.com/apache/airflow/blob/10023fdd65fa78033e7125d3d8103b63c127056e/airflow/models/taskinstance.py#L224-L230), which is made up of `dag_id` (which we can't change), `task_id` (which we can't change), `execution_date` (nope, can't change), and `try_number` ( we can change this!!). So to work around this, I created a utility DAG that will find all task instances in a "queued" or "None" state and increment the `try_number` field. The DAG runs as a single `PythonOperator` with the following callable: ```python @provide_session def unstick_dag_callable(dag_run, session, **kwargs): dag_id = dag_run.conf.get("dag_id") if not dag_id: raise AssertionError("dag_id was not provided") execution_date = dag_run.conf.get("execution_date") if not execution_date: raise AssertionError("execution_date was not provided") execution_date = parse(execution_date) filter = [ or_(TaskInstance.state == State.QUEUED, TaskInstance.state == State.NONE), TaskInstance.dag_id == dag_id, TaskInstance.execution_date == execution_date, ] print( ( f"DAG id: {dag_id}, Execution Date: {execution_date}, State: " f"""{dag_run.conf.get("state", f"{State.QUEUED} or {State.NONE}")}, """ f"Filter: {[str(f) for f in filter]}" ) ) tis = session.query(TaskInstance).filter(*filter).all() dr = ( session.query(DagRun) .filter(DagRun.dag_id == dag_id, DagRun.execution_date == execution_date) .first() ) dagrun = ( dict( id=dr.id, dag_id=dr.dag_id, execution_date=dr.execution_date, start_date=dr.start_date, end_date=dr.end_date, _state=dr._state, run_id=dr.run_id, creating_job_id=dr.creating_job_id, external_trigger=dr.external_trigger, run_type=dr.run_type, conf=dr.conf, last_scheduling_decision=dr.last_scheduling_decision, dag_hash=dr.dag_hash, ) if dr else {} ) print(f"Updating {len(tis)} task instances") print("Here are the task instances we're going to update") # Print no more than 100 tis so we don't lock up the session too long for ti in tis[:100]: pprint( dict( task_id=ti.task_id, job_id=ti.job_id, key=ti.key, dag_id=ti.dag_id, execution_date=ti.execution_date, state=ti.state, dag_run={**dagrun}, ) ) if len(tis) > 100: print("Output truncated after 100 task instances") for ti in tis: ti.try_number = ti.next_try_number ti.state = State.NONE session.merge(ti) if dag_run.conf.get("activate_dag_runs", True): dr.state = State.RUNNING dr.start_date = timezone.utcnow() print("Done") ``` Moments after I shipped this DAG, another DAG got stuck, and I had a chance to see if
[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes edited a comment on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011580548 @potiuk @mik-laj Can we also fix the constraints for airflow `2.2.3`? It looks like `snowflake-sqlalchemy` is set to `1.2.5` [here](https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt#L485). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes commented on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011580548 @potiuk @mik-laj Can we also fix the constraints for airflow `2.2.3`? It looks like `snowflake-sqlalchemy` is still set to `1.2.5` [here](https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt#L485). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] SamWheating commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
SamWheating commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783539524 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob + +import logging + +from sqlalchemy import and_, func + +from airflow.models import TaskReschedule +from airflow.utils import timezone + +now = timezone.utcnow + + +objects = { +'job': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': BaseJob, +'recency_column': BaseJob.latest_heartbeat, +}, +'dag': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': DagModel, +'recency_column': DagModel.last_parsed_time, +}, +'dag_run': { +'keep_last': True, +"keep_last_filters": [DagRun.external_trigger.is_(False)], +"keep_last_group_by": DagRun.dag_id, +'orm_model': DagRun, +'recency_column': DagRun.execution_date, +}, +'import_error': { Review comment: Is there any reason to expire old Import Errors? They get removed constantly by the File Processor as files are re-processed: https://github.com/apache/airflow/blob/3ccb79423e8966305bb762200b53134dd2b349ec/airflow/dag_processing/processor.py#L539-L543 And they get removed when the related file no longer exists: https://github.com/apache/airflow/blob/3ccb79423e8966305bb762200b53134dd2b349ec/airflow/dag_processing/processor.py#L539-L543 So I wouldn't expect there to be any dangling records here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] closed pull request #18575: Resync DAG during next parse if error in ``sync_to_db``
github-actions[bot] closed pull request #18575: URL: https://github.com/apache/airflow/pull/18575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes removed a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes removed a comment on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011561961 https://github.com/apache/airflow/pull/20245#issuecomment-992387108 Can we please fix the constraints for 2.2.3? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] NadimYounes commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5
NadimYounes commented on pull request #20245: URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011561961 https://github.com/apache/airflow/pull/20245#issuecomment-992387108 Can we please fix the constraints for 2.2.3? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
mik-laj commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783528233 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob + +import logging + +from sqlalchemy import and_, func + +from airflow.models import TaskReschedule +from airflow.utils import timezone + +now = timezone.utcnow + + +objects = { +'job': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': BaseJob, +'recency_column': BaseJob.latest_heartbeat, +}, +'dag': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': DagModel, +'recency_column': DagModel.last_parsed_time, +}, +'dag_run': { +'keep_last': True, +"keep_last_filters": [DagRun.external_trigger.is_(False)], +"keep_last_group_by": DagRun.dag_id, +'orm_model': DagRun, +'recency_column': DagRun.execution_date, +}, +'import_error': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': ImportError, +'recency_column': ImportError.timestamp, +}, +'log': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': Log, +'recency_column': Log.dttm, +}, +'rendered_task_instance_fields': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': RenderedTaskInstanceFields, +'recency_column': RenderedTaskInstanceFields.execution_date, +}, +'sla_miss': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': SlaMiss, +'recency_column': SlaMiss.execution_date, +}, +'task_fail': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskFail, +'recency_column': TaskFail.execution_date, +}, +'task_instance': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskInstance, +'recency_column': TaskInstance.execution_date, +}, +'task_reschedule': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskReschedule, +'recency_column': TaskReschedule.execution_date, +}, +'xcom': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': XCom, +'recency_column': XCom.execution_date, +}, +} + + +airflow_executor = str(conf.get("core", "executor")) +if airflow_executor == "CeleryExecutor": +from celery.backends.database.models import Task, TaskSet + +print("Including Celery Modules") +try: +objects.update( +**{ +'task': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': Task, +'recency_column': Task.date_done, +}, +'task_set': { +'keep_last': False, +'keep_last_filters':
[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
mik-laj commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783528077 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob + +import logging + +from sqlalchemy import and_, func + +from airflow.models import TaskReschedule +from airflow.utils import timezone + +now = timezone.utcnow + + +objects = { +'job': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': BaseJob, +'recency_column': BaseJob.latest_heartbeat, +}, +'dag': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': DagModel, +'recency_column': DagModel.last_parsed_time, +}, +'dag_run': { +'keep_last': True, +"keep_last_filters": [DagRun.external_trigger.is_(False)], +"keep_last_group_by": DagRun.dag_id, +'orm_model': DagRun, +'recency_column': DagRun.execution_date, +}, +'import_error': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': ImportError, +'recency_column': ImportError.timestamp, +}, +'log': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': Log, +'recency_column': Log.dttm, +}, +'rendered_task_instance_fields': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': RenderedTaskInstanceFields, +'recency_column': RenderedTaskInstanceFields.execution_date, +}, +'sla_miss': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': SlaMiss, +'recency_column': SlaMiss.execution_date, +}, +'task_fail': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskFail, +'recency_column': TaskFail.execution_date, +}, +'task_instance': { +'keep_last': False, +'keep_last_filters': None, +'keep_last_group_by': None, +'orm_model': TaskInstance, +'recency_column': TaskInstance.execution_date, Review comment: I'm not sure about it. It is a association proxy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
mik-laj commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783523781 ## File path: airflow/cli/cli_parser.py ## @@ -1054,6 +1080,14 @@ class GroupCommand(NamedTuple): args=(ARG_CLEAR_ONLY,), ), ) +MAINTENANCE_COMMANDS = ( +ActionCommand( +name='cleanup', Review comment: In the future, we can add other commands that will clean other things, e.g. logs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
mik-laj commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783523446 ## File path: airflow/cli/cli_parser.py ## @@ -1054,6 +1080,14 @@ class GroupCommand(NamedTuple): args=(ARG_CLEAR_ONLY,), ), ) +MAINTENANCE_COMMANDS = ( +ActionCommand( +name='cleanup', Review comment: ```suggestion name='cleanup-tables', ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
mik-laj commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783520216 ## File path: airflow/cli/cli_parser.py ## @@ -378,6 +383,27 @@ def _check(value): ARG_CONF = Arg(('-c', '--conf'), help="JSON string that gets pickled into the DagRun's conf attribute") ARG_EXEC_DATE = Arg(("-e", "--exec-date"), help="The execution date of the DAG", type=parsedate) +# maintenance +ARG_MAINTENANCE_TABLES = Arg( +("-t", "--tables"), +help="Table names to perform maintenance on (use comma-separated list)", Review comment: Can you add a list of accepted values? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data
mik-laj commented on a change in pull request #20838: URL: https://github.com/apache/airflow/pull/20838#discussion_r783519744 ## File path: airflow/utils/metastore_cleanup.py ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +orm_model: the table +recency_column: date column to filter by +keep_last: whether the last record should be kept even if it's older than clean_before_timestamp +keep_last_filters: +keep_last_group_by: if keeping the last record, can keep the last record for each group +""" + + +from pendulum import DateTime +from sqlalchemy.orm import Query + +from airflow import settings +from airflow.configuration import conf +from airflow.models import ( +DagModel, +DagRun, +ImportError, +Log, +RenderedTaskInstanceFields, +SlaMiss, +TaskFail, +TaskInstance, +XCom, +) + +try: +from airflow.jobs import BaseJob +except Exception as e: +from airflow.jobs.base_job import BaseJob Review comment: It is not needed, because we don't maintain compatibility with Airflow 1.10. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish opened a new pull request #20838: Add `maintenance cleanup` CLI command for purging old data
dstandish opened a new pull request #20838: URL: https://github.com/apache/airflow/pull/20838 Must supply "purge before date". Can optionally provide table list. Dry run will only print the number of rows meeting criteria. If not dry run, will require the user to confirm before deleting. *note* draft. and still need to add tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] SamWheating edited a comment on issue #20832: Unable to specify Python version for AwsGlueJobOperator
SamWheating edited a comment on issue #20832: URL: https://github.com/apache/airflow/issues/20832#issuecomment-1011511506 Also for what its worth, I think that you `Command` block is invalid, as the `Command.Name` you're using (`abalone-preprocess`) must be one of `glueetl`, `pythonshell` or `gluestreaming`. https://docs.aws.amazon.com/glue/latest/webapi/API_JobCommand.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] SamWheating edited a comment on issue #20832: Unable to specify Python version for AwsGlueJobOperator
SamWheating edited a comment on issue #20832: URL: https://github.com/apache/airflow/issues/20832#issuecomment-1011502821 I think that this is because the GlueHook is pretty opinionated and hardcodes the value of `Command` when running the `glue_client.create_job` command: https://github.com/apache/airflow/blob/2ab2ae8849bf6d80a700b1b74cef37eb187161ad/airflow/providers/amazon/aws/hooks/glue.py#L181-L225 So when you provide `Command` as a `create_job_kwargs`, it ends up being supplied twice to that function (Although I suspect that this would be a typeError, not a keyError 樂) Anyways, thoughts on just making the `Command.Name` and `Command.PythonVersion` argument configurable in the `GlueJobOperator`? If y'all think that this is a satisfactory fix, feel free to assign this issue to me and I can put up a quick PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org