[GitHub] [airflow] potiuk commented on pull request #20847: Added other Opentelemery Instrumentations

2022-01-12 Thread GitBox


potiuk commented on pull request #20847:
URL: https://github.com/apache/airflow/pull/20847#issuecomment-1011887017


   Nice. @Melodie97 - while you are at it - maybe - while preparing to the demo 
next week you could also take a look at the beginning of this: 
   
   https://github.com/apache/airflow/issues/20778
   
   Which might be a good thing now as we get it working as it will simply be 
making the changes that you've made more "pluggable" and "optional".
   
   I will make some comments in your PR and give you some guidelines for that :)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] Melodie97 commented on issue #20774: Expand POC with all possible standard metrics available in Open Telemetry

2022-01-12 Thread GitBox


Melodie97 commented on issue #20774:
URL: https://github.com/apache/airflow/issues/20774#issuecomment-1011885124


   Ok, I'm done with them all


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] Melodie97 opened a new pull request #20847: Added other Opentelemery Instrumentations

2022-01-12 Thread GitBox


Melodie97 opened a new pull request #20847:
URL: https://github.com/apache/airflow/pull/20847


   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


potiuk commented on a change in pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#discussion_r783700863



##
File path: tests/sensors/test_python.py
##
@@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self):
 
 ds_templated = DEFAULT_DATE.date().isoformat()
 # 2 calls: first: at start, second: before timeout
-assert 2 == len(recorded_calls)
+assert 1 <= len(recorded_calls)

Review comment:
   The thing here is that his is the "timing" tests  and we test it with 
"run" method and expect timeout. 
   
   In perfect world, we could likely use freezegun and simulate the exact time 
passing, but IMHO it is totally not worth it if we can 100% fix it with this 
three characters change. Trading test simplicity with perfectness. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


potiuk commented on a change in pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#discussion_r783700863



##
File path: tests/sensors/test_python.py
##
@@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self):
 
 ds_templated = DEFAULT_DATE.date().isoformat()
 # 2 calls: first: at start, second: before timeout
-assert 2 == len(recorded_calls)
+assert 1 <= len(recorded_calls)

Review comment:
   The thing here is that his is the "timing" tests  and we test it with 
"run" method and expect timeout. 
   
   In perfect world, we could likely use freezegun and simulate the exact time 
passing, but IMHO it is totally not worth it if we can 100% fix it with this 
one character change. Trading test simplicity with perfectness. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


potiuk commented on a change in pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#discussion_r783698559



##
File path: tests/sensors/test_python.py
##
@@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self):
 
 ds_templated = DEFAULT_DATE.date().isoformat()
 # 2 calls: first: at start, second: before timeout
-assert 2 == len(recorded_calls)
+assert 1 <= len(recorded_calls)

Review comment:
   It' sensor and number of calls depends on how fast the whole system is 
at all (sometimes we will not get the second one if the system is very busy 
with other parallell tests. 
   
   We know tht the first call happens for sure (it is immediate), and most of 
the time the second too. The assert is actually correct IMHO ("expect at least 
1 call").  We only check the first call anyway (and this is the gist of it as 
we really want to see if JINJA template works in the sensor call). 
   
   If you think on how to fix it better I am all ears :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


potiuk commented on a change in pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#discussion_r783698559



##
File path: tests/sensors/test_python.py
##
@@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self):
 
 ds_templated = DEFAULT_DATE.date().isoformat()
 # 2 calls: first: at start, second: before timeout
-assert 2 == len(recorded_calls)
+assert 1 <= len(recorded_calls)

Review comment:
   It' sensor and number of calls depends on how fast the whole system is 
at all (sometimes we will not get the second one if the system is very busy 
with other parallell tests. 
   
   We know tht the first call happens for sure (it is immediate), and most of 
the time the second too. The assert is actually correct IMHO ("expect at least 
1 call").  We only check the first call anyway (and this is the gist of it as 
we really want to see if JINJA template works in the sensor call). 
   
   If you think on how to fix it better I am al ears :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


potiuk commented on a change in pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#discussion_r783698559



##
File path: tests/sensors/test_python.py
##
@@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self):
 
 ds_templated = DEFAULT_DATE.date().isoformat()
 # 2 calls: first: at start, second: before timeout
-assert 2 == len(recorded_calls)
+assert 1 <= len(recorded_calls)

Review comment:
   It' sensor and number of calls depends on how fast the whole system is 
at all. We know tht the first call happens for sure, and most of the time the 
second too. The assert is actually correct IMHO ("expect at least 1 call").  We 
only check the first call anyway (and this is the gist of it as we really want 
to see if JINJA template works in the sensor call). 
   
   If you think on how to fix it better I am al ears :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on issue #17279: Logout airflow on Web UI does not work using OAuth2

2022-01-12 Thread GitBox


potiuk commented on issue #17279:
URL: https://github.com/apache/airflow/issues/17279#issuecomment-1011872225


   cc: @kaxil 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk edited a comment on issue #17279: Logout airflow on Web UI does not work using OAuth2

2022-01-12 Thread GitBox


potiuk edited a comment on issue #17279:
URL: https://github.com/apache/airflow/issues/17279#issuecomment-1011871954


   Flask App builder constraints are updated already  - both main and 2.2 
branch. So we could fix it in main and cherry-pick to 2.2.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on issue #17279: Logout airflow on Web UI does not work using OAuth2

2022-01-12 Thread GitBox


potiuk commented on issue #17279:
URL: https://github.com/apache/airflow/issues/17279#issuecomment-1011871954


   Flask App builder constraints are updated already  - both main and 2.2 
branch. So we could fix it in main and cherry-pick to 2.3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] blag edited a comment on issue #17279: Logout airflow on Web UI does not work using OAuth2

2022-01-12 Thread GitBox


blag edited a comment on issue #17279:
URL: https://github.com/apache/airflow/issues/17279#issuecomment-991370837


   Now that dpgaspar/Flask-AppBuilder#1749 is merged, this is one step closer 
to being fixed.
   
   TODO as of now:
   
   * [x] Wait for a new release of FAB that includes that fix - [3.4.1 on Dec 
13th, 2021](https://github.com/dpgaspar/Flask-AppBuilder/pull/1759)
   * [x] Update Airflow constraints files
   * [ ] Check if this bug is fixed in Airflow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch main updated (8dc68d4 -> c49d6ec)

2022-01-12 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 8dc68d4  Doc: Added an enum param example (#20841)
 add c49d6ec  static code check doc fix (#20844)

No new revisions were added by this update.

Summary of changes:
 STATIC_CODE_CHECKS.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[GitHub] [airflow] potiuk closed issue #20823: Static check docs mistakes in example

2022-01-12 Thread GitBox


potiuk closed issue #20823:
URL: https://github.com/apache/airflow/issues/20823


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk merged pull request #20844: static code check doc fix

2022-01-12 Thread GitBox


potiuk merged pull request #20844:
URL: https://github.com/apache/airflow/pull/20844


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] Bowrna opened a new pull request #20844: static code check doc fix

2022-01-12 Thread GitBox


Bowrna opened a new pull request #20844:
URL: https://github.com/apache/airflow/pull/20844


   Static code check docs issue in current breeze environment.
   closes: https://github.com/apache/airflow/issues/20823
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow-client-python] feluelle commented on issue #20: get_tasks api is broken

2022-01-12 Thread GitBox


feluelle commented on issue #20:
URL: 
https://github.com/apache/airflow-client-python/issues/20#issuecomment-1011859928


   @msumit do you have any idea why this is? I am running into this problem 
still on the latest version (2.2.0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

2022-01-12 Thread GitBox


potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056


   I actually spent some time few days ago looking at the mini-scheduler code 
but I could not really find a flaw there. The fact that it did not help you 
indicates that my hypothesis was unfounded, unfortunately. and maybe the reason 
was different (and the fact that it worked for @stablum was mainly a 
coincidence or some side effect of that change).
   
   @dwiajik  - it might also be that your case is a bit different - could you 
please report (maybe create a gist with a few examples of) some of the logs of 
your deadlocks - Ideally if you could send us the logs of failing scheduler and 
corresponding logs of the Postgres server from the same time - I believe it 
will be much easier to investigate if we see few examples - and the server logs 
shoud tell us exactly which two queries deadlocked and this should help us a 
lot.
   
   What we really need is somethiing in hte  /var/lib/pgsql/data/pg_log/*.log, 
there should be entries at the time when then deadlock happens that looks like 
this:
   
   ```
   ERROR:  deadlock detected
   DETAIL:  Process 21535 waits for AccessExclusiveLock on relation 342640 of 
database 41454; blocked by process 21506.
   Process 21506 waits for AccessExclusiveLock on relation 342637 of database 
41454; blocked by process 21535.
   HINT:  See server log for query details.
   CONTEXT:  SQL statement "UPDATE ..."
   
   
   We need ideally those and some logs around it if possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

2022-01-12 Thread GitBox


potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056


   I actually spent some time few days ago looking at the mini-scheduler code 
but I could not really find a flaw there. The fact that it did not help you 
indicates that my hypothesis was unfounded, unfortunately. and maybe the reason 
was different (and the fact that it worked for @stablum was mainly a 
coincidence or some side effect of that change).
   
   @dwiajik  - it might also be that your case is a bit different - could you 
please report (maybe create a gist with a few examples of) some of the logs of 
your deadlocks - Ideally if you could send us the logs of failing scheduler and 
corresponding logs of the Postgres server from the same time - I believe it 
will be much easier to investigate if we see few examples - and the server logs 
shoud tell us exactly which two queries deadlocked and this should help us a 
lot.
   
   What we really need is somethiing in hte  /var/lib/pgsql/data/pg_log/*.log, 
there should be entries at the time when then deadlock happens that looks like 
this:
   
   ```
   ERROR:  deadlock detected
   DETAIL:  Process 21535 waits for AccessExclusiveLock on relation 342640 of 
database 41454; blocked by process 21506.
   Process 21506 waits for AccessExclusiveLock on relation 342637 of database 
41454; blocked by process 21535.
   HINT:  See server log for query details.
   CONTEXT:  SQL statement "UPDATE ..."
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] commented on pull request #20349: Fix Scheduler crash when executing task instances of missing DAG

2022-01-12 Thread GitBox


github-actions[bot] commented on pull request #20349:
URL: https://github.com/apache/airflow/pull/20349#issuecomment-1011850115


   The PR most likely needs to run full matrix of tests because it modifies 
parts of the core of Airflow. However, committers might decide to merge it 
quickly and take the risk. If they don't merge it quickly - please rebase it to 
the latest main at your convenience, or amend the last commit of the PR, and 
push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20795: Fix remaining mypy issues in "core" Airflow

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20795:
URL: https://github.com/apache/airflow/pull/20795#discussion_r783675997



##
File path: airflow/models/taskmixin.py
##
@@ -114,6 +115,14 @@ class DAGNode(DependencyMixin, metaclass=ABCMeta):
 def node_id(self) -> str:
 raise NotImplementedError()
 
+@property
+def label(self) -> Optional[str]:
+tg: Optional["TaskGroup"] = getattr(self, 'task_group', None)
+if tg and tg.node_id and tg.prefix_group_id:
+# "task_group_id.task_id" -> "task_id"
+return self.node_id[len(tg.node_id) + 1 :]

Review comment:
   ```suggestion
   return self.node_id.split(".", 1)[-1]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


potiuk commented on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011849274


   
https://github.com/apache/airflow/blob/constraints-2.2.3/constraints-3.7.txt#L485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


potiuk commented on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011848901


   @NadimYounes -Indeed 2.2.3 constraints were prepared before snowflake was 
yanked. It should be fixed now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] tag constraints-2.2.3 updated (62d490d -> 8a96274)

2022-01-12 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a change to tag constraints-2.2.3
in repository https://gitbox.apache.org/repos/asf/airflow.git.


*** WARNING: tag constraints-2.2.3 was modified! ***

from 62d490d  (commit)
  to 8a96274  (commit)
from 62d490d  Revert constraints for yanked 
`apache-airflow-providers-amazon==2.5.0`
 add 8a96274  Fix constraints to yanked snowflake release

No new revisions were added by this update.

Summary of changes:
 constraints-3.6.txt | 2 +-
 constraints-3.7.txt | 2 +-
 constraints-3.8.txt | 2 +-
 constraints-3.9.txt | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)


[GitHub] [airflow] uranusjr commented on a change in pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#discussion_r783674157



##
File path: tests/sensors/test_python.py
##
@@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self):
 
 ds_templated = DEFAULT_DATE.date().isoformat()
 # 2 calls: first: at start, second: before timeout
-assert 2 == len(recorded_calls)
+assert 1 <= len(recorded_calls)

Review comment:
   Why do we have the second call at all? Can we not have it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] 01/01: Fix constraints to yanked snowflake release

2022-01-12 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch constraints-2-2-3-fixed
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 8a962740270a9ecf6b4f910173321b183e7adf4a
Author: Jarek Potiuk 
AuthorDate: Thu Jan 13 07:53:36 2022 +0100

Fix constraints to yanked snowflake release
---
 constraints-3.6.txt | 2 +-
 constraints-3.7.txt | 2 +-
 constraints-3.8.txt | 2 +-
 constraints-3.9.txt | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/constraints-3.6.txt b/constraints-3.6.txt
index d5c11d4..7a3bc62 100644
--- a/constraints-3.6.txt
+++ b/constraints-3.6.txt
@@ -485,7 +485,7 @@ snakebite-py3==3.0.5
 sniffio==1.2.0
 snowballstemmer==2.2.0
 snowflake-connector-python==2.7.1
-snowflake-sqlalchemy==1.2.5
+snowflake-sqlalchemy==1.2.4
 sortedcontainers==2.4.0
 soupsieve==2.3.1
 sphinx-airflow-theme==0.0.6
diff --git a/constraints-3.7.txt b/constraints-3.7.txt
index 44dfd47..f83f1af 100644
--- a/constraints-3.7.txt
+++ b/constraints-3.7.txt
@@ -482,7 +482,7 @@ snakebite-py3==3.0.5
 sniffio==1.2.0
 snowballstemmer==2.2.0
 snowflake-connector-python==2.7.1
-snowflake-sqlalchemy==1.2.5
+snowflake-sqlalchemy==1.2.4
 sortedcontainers==2.4.0
 soupsieve==2.3.1
 sphinx-airflow-theme==0.0.6
diff --git a/constraints-3.8.txt b/constraints-3.8.txt
index 67e8d5f..eab89b4 100644
--- a/constraints-3.8.txt
+++ b/constraints-3.8.txt
@@ -480,7 +480,7 @@ snakebite-py3==3.0.5
 sniffio==1.2.0
 snowballstemmer==2.2.0
 snowflake-connector-python==2.7.1
-snowflake-sqlalchemy==1.2.5
+snowflake-sqlalchemy==1.2.4
 sortedcontainers==2.4.0
 soupsieve==2.3.1
 sphinx-airflow-theme==0.0.6
diff --git a/constraints-3.9.txt b/constraints-3.9.txt
index aba077a..ea90cad 100644
--- a/constraints-3.9.txt
+++ b/constraints-3.9.txt
@@ -475,7 +475,7 @@ snakebite-py3==3.0.5
 sniffio==1.2.0
 snowballstemmer==2.2.0
 snowflake-connector-python==2.7.1
-snowflake-sqlalchemy==1.2.5
+snowflake-sqlalchemy==1.2.4
 sortedcontainers==2.4.0
 soupsieve==2.3.1
 sphinx-airflow-theme==0.0.6


[airflow] branch constraints-2-2-3-fixed created (now 8a96274)

2022-01-12 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a change to branch constraints-2-2-3-fixed
in repository https://gitbox.apache.org/repos/asf/airflow.git.


  at 8a96274  Fix constraints to yanked snowflake release

This branch includes the following new commits:

 new 8a96274  Fix constraints to yanked snowflake release

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[GitHub] [airflow] uranusjr commented on a change in pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#discussion_r783673639



##
File path: tests/sensors/test_python.py
##
@@ -85,7 +85,7 @@ def test_python_callable_arguments_are_templatized(self):
 
 ds_templated = DEFAULT_DATE.date().isoformat()
 # 2 calls: first: at start, second: before timeout
-assert 2 == len(recorded_calls)
+assert 1 <= len(recorded_calls)

Review comment:
   This seems to defeat the purpose of this assertion. Why is the second 
call not recorded?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783672138



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob
+
+import logging
+
+from sqlalchemy import and_, func
+
+from airflow.models import TaskReschedule
+from airflow.utils import timezone
+
+now = timezone.utcnow
+
+
+objects = {

Review comment:
   These structured data should use a more structured type than 
dict-of-dicts. A dict of namedtuples instead would be a big improvement.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783671395



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob
+
+import logging
+
+from sqlalchemy import and_, func
+
+from airflow.models import TaskReschedule
+from airflow.utils import timezone
+
+now = timezone.utcnow
+
+
+objects = {
+'job': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': BaseJob,
+'recency_column': BaseJob.latest_heartbeat,
+},
+'dag': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': DagModel,
+'recency_column': DagModel.last_parsed_time,
+},
+'dag_run': {
+'keep_last': True,
+"keep_last_filters": [DagRun.external_trigger.is_(False)],
+"keep_last_group_by": DagRun.dag_id,
+'orm_model': DagRun,
+'recency_column': DagRun.execution_date,
+},
+'import_error': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': ImportError,
+'recency_column': ImportError.timestamp,
+},
+'log': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': Log,
+'recency_column': Log.dttm,
+},
+'rendered_task_instance_fields': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': RenderedTaskInstanceFields,
+'recency_column': RenderedTaskInstanceFields.execution_date,
+},
+'sla_miss': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': SlaMiss,
+'recency_column': SlaMiss.execution_date,
+},
+'task_fail': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskFail,
+'recency_column': TaskFail.execution_date,
+},
+'task_instance': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskInstance,
+'recency_column': TaskInstance.execution_date,

Review comment:
   This should work, there are existing code doing it. The association 
proxy does not always work like real columns, but does work for this specific 
usage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ephraimbuddy commented on a change in pull request #20349: Fix Scheduler crash when executing task instances of missing DAG

2022-01-12 Thread GitBox


ephraimbuddy commented on a change in pull request #20349:
URL: https://github.com/apache/airflow/pull/20349#discussion_r783670984



##
File path: airflow/jobs/scheduler_job.py
##
@@ -403,6 +403,15 @@ def _executable_task_instances_to_queued(self, max_tis: 
int, session: Session =
 # Many dags don't have a task_concurrency, so where we can 
avoid loading the full
 # serialized DAG the better.
 serialized_dag = self.dagbag.get_dag(dag_id, 
session=session)
+# If the dag is missing, fail the task and continue to the 
next task.
+if not serialized_dag:
+self.log.error(
+"DAG '%s' for task instance %s not found in 
serialized_dag table",
+dag_id,
+task_instance,
+)
+task_instance.set_state(State.FAILED, session=session)

Review comment:
   I will opt for setting all scheduled tasks to None. I doubt that failing 
the DAG here will really fail it when there're tasks being executed in 
executor. There's a bug that when you mark a DAG as failed, it comes up again 
as running https://github.com/apache/airflow/issues/16078
   
   So I propose to set all scheduled tasks to None. The scheduler will no 
longer move the task instances to scheduled when the dag can no longer be found 
and it makes sense to set it to None instead of failing it since the task 
instances won't have logs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch main updated (83b1e36 -> 8dc68d4)

2022-01-12 Thread kaxilnaik
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 83b1e36  Speedup liveness probe for scheduler and triggerer (#20833)
 add 8dc68d4  Doc: Added an enum param example (#20841)

No new revisions were added by this update.

Summary of changes:
 docs/apache-airflow/concepts/params.rst | 3 +++
 1 file changed, 3 insertions(+)


[GitHub] [airflow] kaxil merged pull request #20841: added an enum param example

2022-01-12 Thread GitBox


kaxil merged pull request #20841:
URL: https://github.com/apache/airflow/pull/20841


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #20795: Fix remaining mypy issues in "core" Airflow

2022-01-12 Thread GitBox


potiuk commented on pull request #20795:
URL: https://github.com/apache/airflow/pull/20795#issuecomment-1011843784


   Flay test fixed in https://github.com/apache/airflow/pull/20843


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


potiuk commented on pull request #20843:
URL: https://github.com/apache/airflow/pull/20843#issuecomment-1011843612


   Fixes flaky test (example in #20795) 
https://github.com/apache/airflow/runs/4793994282?check_suite_focus=true


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783669005



##
File path: airflow/cli/commands/maintenance_command.py
##
@@ -0,0 +1,31 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Maintenance sub-commands"""
+from airflow.utils.metastore_cleanup import run_cleanup
+
+
+def cleanup(args):
+"""Purges old records in metastore database"""
+print(args.dry_run)
+kwargs = dict(
+dry_run=args.dry_run, 
clean_before_timestamp=args.clean_before_timestamp, verbose=args.verbose
+)
+if args.tables:
+kwargs.update(
+table_names=args.tables,
+)
+run_cleanup(**kwargs)

Review comment:
   Since you do `if table_name` in `run_cleanup`, this is equivalent to
   
   ```python
   run_cleanup(
   dry_run=args.dry_run,
   clean_before_timestamp=args.clean_before_timestamp,
   verbose=args.verbose,
   table_names=args.tables,
   )
   ```
   
   With appropriate docstring in `run_cleanup` (to clarify that passing an 
empty list is the same as not passing the argument), this would be more 
readable and maintainable that the current implementation IMO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk opened a new pull request #20843: Fix flaky templatized call

2022-01-12 Thread GitBox


potiuk opened a new pull request #20843:
URL: https://github.com/apache/airflow/pull/20843


   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #19769: Handle stuck queued tasks in Celery

2022-01-12 Thread GitBox


kaxil commented on a change in pull request #19769:
URL: https://github.com/apache/airflow/pull/19769#discussion_r783667980



##
File path: airflow/executors/celery_executor.py
##
@@ -377,6 +385,49 @@ def _check_for_stalled_adopted_tasks(self):
 for key in timedout_keys:
 self.change_state(key, State.FAILED)
 
+@provide_session
+def _clear_stuck_queued_tasks(self, session: Session = NEW_SESSION) -> 
None:
+"""
+Tasks can get stuck in queued state in DB while still not in
+worker. This happens when the worker is autoscaled down and
+the task is queued but has not been picked up by any worker prior to 
the scaling.
+
+In such situation, we update the task instance state to scheduled so 
that
+it can be queued again. We chose to use task_adoption_timeout to decide
+"""
+if not isinstance(app.backend, DatabaseBackend):
+# We only want to do this for database backends where

Review comment:
   Oh yes, atleast in that case, the comment should be clear with a proper 
TODO or just solved in this PR too for all the other backends. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ephraimbuddy commented on a change in pull request #19769: Handle stuck queued tasks in Celery

2022-01-12 Thread GitBox


ephraimbuddy commented on a change in pull request #19769:
URL: https://github.com/apache/airflow/pull/19769#discussion_r783662283



##
File path: airflow/executors/celery_executor.py
##
@@ -377,6 +385,49 @@ def _check_for_stalled_adopted_tasks(self):
 for key in timedout_keys:
 self.change_state(key, State.FAILED)
 
+@provide_session
+def _clear_stuck_queued_tasks(self, session: Session = NEW_SESSION) -> 
None:
+"""
+Tasks can get stuck in queued state in DB while still not in
+worker. This happens when the worker is autoscaled down and
+the task is queued but has not been picked up by any worker prior to 
the scaling.
+
+In such situation, we update the task instance state to scheduled so 
that
+it can be queued again. We chose to use task_adoption_timeout to decide
+"""
+if not isinstance(app.backend, DatabaseBackend):
+# We only want to do this for database backends where

Review comment:
   I can't figure how it could be solved in Redis. Do you have a 
reproductive step for Redis just like in #19699




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #18575: Resync DAG during next parse if error in ``sync_to_db``

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #18575:
URL: https://github.com/apache/airflow/pull/18575#discussion_r783661606



##
File path: airflow/models/dagbag.py
##
@@ -584,22 +584,35 @@ def _serialize_dag_capturing_errors(dag, session):
 
 We can't place them directly in import_errors, as this may be 
retried, and work the next time
 """
-if dag.is_subdag:
-return []
 try:
 # We can't use bulk_write_to_db as we want to capture each 
error individually
 dag_was_updated = SerializedDagModel.write_dag(
 dag,
 
min_update_interval=settings.MIN_SERIALIZED_DAG_UPDATE_INTERVAL,
 session=session,
 )
-if dag_was_updated:
-self._sync_perm_for_dag(dag, session=session)
+return dag_was_updated, []

Review comment:
   Probably more readable to put this in an `else` block after all the 
exception handlers?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] commented on pull request #20841: added an enum param example

2022-01-12 Thread GitBox


github-actions[bot] commented on pull request #20841:
URL: https://github.com/apache/airflow/pull/20841#issuecomment-1011831582


   The PR is likely ready to be merged. No tests are needed as no important 
environment files, nor python files were modified by it. However, committers 
might decide that full test matrix is needed and add the 'full tests needed' 
label. Then you should rebase it to the latest main or amend the last commit of 
the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20286: Add TaskMap and TaskInstance.map_id

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20286:
URL: https://github.com/apache/airflow/pull/20286#discussion_r783658006



##
File path: airflow/models/taskinstance.py
##
@@ -2128,6 +2138,14 @@ def set_duration(self) -> None:
 self.duration = None
 self.log.debug("Task Duration set to %s", self.duration)
 
+def _record_task_map_for_downstreams(self, value: Any, *, session: 
Session) -> None:
+if not self.task.has_mapped_dependants():
+return
+if not isinstance(value, collections.abc.Collection) or 
isinstance(value, (bytes, str)):
+self.log.info("Failing %s for unmappable XCom push %r", self.key, 
value)
+raise UnmappableXComPushed(value)

Review comment:
   Changed both logging and exception to only include the variable type 
instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20286: Add TaskMap and TaskInstance.map_id

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20286:
URL: https://github.com/apache/airflow/pull/20286#discussion_r783657603



##
File path: airflow/models/baseoperator.py
##
@@ -1632,6 +1632,33 @@ def defer(
 def map(self, **kwargs) -> "MappedOperator":
 return MappedOperator.from_operator(self, kwargs)
 
+def has_mapped_dependants(self) -> bool:
+"""Whether any downstream dependencies depend on this task for 
mapping."""
+from airflow.utils.task_group import MappedTaskGroup, TaskGroup
+
+if not self.has_dag():
+return False
+
+def _walk_group(group: TaskGroup) -> Iterable[Tuple[str, DAGNode]]:
+"""Recursively walk children in a task group.
+
+This yields all direct children (including both tasks and task
+groups), and all children of any task groups.
+"""
+for key, child in group.children.items():
+yield key, child
+if isinstance(child, TaskGroup):
+yield from _walk_group(child)
+
+for key, child in _walk_group(self.dag.task_group):
+if key == self.task_id:
+continue
+if not isinstance(child, (MappedOperator, MappedTaskGroup)):
+continue
+if self.task_id in child.upstream_task_ids:
+return True
+return False

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475121#comment-17475121
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake. And 
then, the scheduler will checkout this and report "task instance X finished 
(success) although the task says its queued. Was the task killed externally?"
   
   this is a simple schematic diagram:
   
   
![image](https://user-images.githubusercontent.com/8371330/149273573-45700f32-079b-4b22-8dba-d6a1ce37a243.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [airflow] ghostbody edited a comment on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the

2022-01-12 Thread GitBox


ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake. And 
then, the scheduler will checkout this and report "task instance X finished 
(success) although the task says its queued. Was the task killed externally?"
   
   this is a simple schematic diagram:
   
   
![image](https://user-images.githubusercontent.com/8371330/149273573-45700f32-079b-4b22-8dba-d6a1ce37a243.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475117#comment-17475117
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake. And 
then, the scheduler will checkout this and report "task instance X finished 
(success) although the task says its queued. Was the task killed externally?"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [airflow] ghostbody edited a comment on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the

2022-01-12 Thread GitBox


ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake. And 
then, the scheduler will checkout this and report "task instance X finished 
(success) although the task says its queued. Was the task killed externally?"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475116#comment-17475116
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [airflow] ghostbody edited a comment on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the

2022-01-12 Thread GitBox


ghostbody edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   
   In this senario, the local task will kill the child process by mistake.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

2022-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475115#comment-17475115
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-

ghostbody commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> --
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, scheduler
>Affects Versions: 1.10.3
>Reporter: msempere
>Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance  2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [airflow] ghostbody commented on issue #10790: Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the task ki

2022-01-12 Thread GitBox


ghostbody commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1011812804


   we reviewed the code and found that in `local_task_job.py`, the parent 
process has a `heatbeat_callback`, and will check the state and child process 
return code of the `task_instance`.
   
   However, theses lines may cover a bug?
   
   
![image](https://user-images.githubusercontent.com/8371330/149270821-45da67da-186e-409b-8f3e-072fe8e0491c.png)
   
   
![image](https://user-images.githubusercontent.com/8371330/149271933-4ae6c8d1-defc-45c6-ba21-89a46016c3d2.png)
   
   
   **The raw task command write back the taskintance's state(like sucess) 
doesn't mean the child process is finished(returned)?**
   
   So, in this heatbeat callback, there maybe a race condition when task state 
is filled back while the child process is not returned.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] MatrixManAtYrService opened a new pull request #20841: added an enum param example

2022-01-12 Thread GitBox


MatrixManAtYrService opened a new pull request #20841:
URL: https://github.com/apache/airflow/pull/20841


   More examples makes it easier to compare our docs with the json-schema docs 
and figure out how they work together (I tested the code shown here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20743: Serialize mapped tasks and task groups

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20743:
URL: https://github.com/apache/airflow/pull/20743#discussion_r783623735



##
File path: airflow/models/baseoperator.py
##
@@ -207,6 +207,7 @@ def apply_defaults(self: "BaseOperator", *args: Any, 
**kwargs: Any) -> Any:
 
 result = func(self, **kwargs, default_args=default_args)
 # Store the args passed to init -- we need them to support 
task.map serialzation!
+kwargs.pop('task_id', None)

Review comment:
   How about storing `task_id` separately instead and not in 
`partial_kwargs`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] nirutgupta edited a comment on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI

2022-01-12 Thread GitBox


nirutgupta edited a comment on issue #20779:
URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011786738


   k8s task handler.py which we can use 
https://gist.github.com/szeevs/938ad3cf96e732d4b1b55a74015aed5b
   log_config.py will look like 
   ```
   from copy import deepcopy
   from airflow.config_templates.airflow_local_settings import 
DEFAULT_LOGGING_CONFIG
   from airflow.configuration import conf
   
   import os
   import sys
   
   sys.path.append('/opt/airflow/custom_log')
   
   BASE_LOG_FOLDER: str = conf.get('logging', 'BASE_LOG_FOLDER')
   FILENAME_TEMPLATE: str = conf.get('logging', 'LOG_FILENAME_TEMPLATE')
   
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   LOGGING_CONFIG['handlers']['k8stask'] = {
   'class': 'k8s_task_handler.KubernetesTaskHandler',
   'formatter': 'airflow',
   'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
   'filename_template': FILENAME_TEMPLATE
   }
   
   LOGGING_CONFIG['loggers']['airflow.task']['handlers'].append('k8stask')
   ```
   
   Make sure while deploying, configure this env
 - name: AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS
   value: "log_config.LOGGING_CONFIG"
   
   This will do the job. :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] nirutgupta edited a comment on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI

2022-01-12 Thread GitBox


nirutgupta edited a comment on issue #20779:
URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011786738


   k8s task handler.py which we can use 
https://gist.github.com/szeevs/938ad3cf96e732d4b1b55a74015aed5b
   log_config.py will look like 
   ```
   from copy import deepcopy
   from airflow.config_templates.airflow_local_settings import 
DEFAULT_LOGGING_CONFIG
   from airflow.configuration import conf
   
   import os
   import sys
   
   sys.path.append('/opt/airflow/custom_log')
   
   BASE_LOG_FOLDER: str = conf.get('logging', 'BASE_LOG_FOLDER')
   FILENAME_TEMPLATE: str = conf.get('logging', 'LOG_FILENAME_TEMPLATE')
   
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   LOGGING_CONFIG['handlers']['k8stask'] = {
   'class': 'k8s_task_handler.KubernetesTaskHandler',
   'formatter': 'airflow',
   'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
   'filename_template': FILENAME_TEMPLATE
   }
   
   LOGGING_CONFIG['loggers']['airflow.task']['handlers'].append('k8stask')
   ```
   
   Make sure while deploying configure this env
 - name: AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS
   value: "log_config.LOGGING_CONFIG"
   
   This will do the job. :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] nirutgupta commented on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI

2022-01-12 Thread GitBox


nirutgupta commented on issue #20779:
URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011786738


   k8s task handler.py which we can use 
https://gist.github.com/szeevs/938ad3cf96e732d4b1b55a74015aed5b
   log_config.py will look like 
   `from copy import deepcopy
   from airflow.config_templates.airflow_local_settings import 
DEFAULT_LOGGING_CONFIG
   from airflow.configuration import conf
   
   import os
   import sys
   
   sys.path.append('/opt/airflow/custom_log')
   
   BASE_LOG_FOLDER: str = conf.get('logging', 'BASE_LOG_FOLDER')
   FILENAME_TEMPLATE: str = conf.get('logging', 'LOG_FILENAME_TEMPLATE')
   
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   LOGGING_CONFIG['handlers']['k8stask'] = {
   'class': 'k8s_task_handler.KubernetesTaskHandler',
   'formatter': 'airflow',
   'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
   'filename_template': FILENAME_TEMPLATE
   }
   
   LOGGING_CONFIG['loggers']['airflow.task']['handlers'].append('k8stask')
   `
   Make sure while deploying configure this env
 - name: AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS
   value: "log_config.LOGGING_CONFIG"
   
   This will do the job. :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] nirutgupta commented on issue #20779: Logs of tasks when running is not available with kubernetesexecutor on webserver UI

2022-01-12 Thread GitBox


nirutgupta commented on issue #20779:
URL: https://github.com/apache/airflow/issues/20779#issuecomment-1011785432


   I see a good working solution of this with 
https://szeevs.medium.com/handling-airflow-logs-with-kubernetes-executor-25c11ea831e4
   All we need to do is make this k8s_task_handler file available under 
utils/log/ and configure log_config.py file to link airflow.tasks to this new 
handler as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20349: Fix Scheduler crash when executing task instances of missing DAG

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20349:
URL: https://github.com/apache/airflow/pull/20349#discussion_r783617068



##
File path: tests/jobs/test_scheduler_job.py
##
@@ -645,6 +645,36 @@ def 
test_find_executable_task_instances_in_default_pool(self, dag_maker):
 session.rollback()
 session.close()
 
+def 
test_executable_task_instances_to_queued_fails_task_for_missing_dag_in_dagbag(
+self, dag_maker, session
+):
+"""Check that task instances of missing DAGs are failed"""
+dag_id = 
'SchedulerJobTest.test_find_executable_task_instances_not_in_dagbag'
+task_id_1 = 'dummy'
+task_id_2 = 'dummydummy'
+
+with dag_maker(dag_id=dag_id, session=session, 
default_args={"max_active_tis_per_dag": 1}):
+DummyOperator(task_id=task_id_1)
+DummyOperator(task_id=task_id_2)
+
+self.scheduler_job = SchedulerJob(subdir=os.devnull)
+self.scheduler_job.dagbag = mock.MagicMock()
+self.scheduler_job.dagbag.get_dag.return_value = None
+
+dr = dag_maker.create_dagrun(state=DagRunState.RUNNING)
+
+tis = dr.task_instances
+for ti in tis:
+ti.state = State.SCHEDULED
+session.merge(ti)
+session.flush()
+res = 
self.scheduler_job._executable_task_instances_to_queued(max_tis=32, 
session=session)
+session.flush()
+assert 0 == len(res)
+tis = dr.get_task_instances(session=session)
+for ti in tis:
+assert ti.state == State.FAILED

Review comment:
   Or
   
   ```python
   assert [TaskInstanceState.FAILED, TaskInstanceState.FAILED] == [ti.state for 
ti in tis]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] uranusjr commented on a change in pull request #20349: Fix Scheduler crash when executing task instances of missing DAG

2022-01-12 Thread GitBox


uranusjr commented on a change in pull request #20349:
URL: https://github.com/apache/airflow/pull/20349#discussion_r783616795



##
File path: airflow/jobs/scheduler_job.py
##
@@ -403,6 +403,15 @@ def _executable_task_instances_to_queued(self, max_tis: 
int, session: Session =
 # Many dags don't have a task_concurrency, so where we can 
avoid loading the full
 # serialized DAG the better.
 serialized_dag = self.dagbag.get_dag(dag_id, 
session=session)
+# If the dag is missing, fail the task and continue to the 
next task.
+if not serialized_dag:
+self.log.error(
+"DAG '%s' for task instance %s not found in 
serialized_dag table",
+dag_id,
+task_instance,
+)
+task_instance.set_state(State.FAILED, session=session)

Review comment:
   > there could be downstream tasks which still attempt to execute, which 
will then be marked failed by the same check
   
   I think this is a good thing in this case. This code is reached because the 
DAG declaring those tasks is gone, so it doesn’t make sense to execute those 
tasks IMO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch constraints-main updated: Updating constraints. Build id:1690466917

2022-01-12 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch constraints-main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/constraints-main by this push:
 new 32bc5e2  Updating constraints. Build id:1690466917
32bc5e2 is described below

commit 32bc5e2d4a8f7b6747e4253577429664dc4e7162
Author: Automated GitHub Actions commit 
AuthorDate: Thu Jan 13 01:54:30 2022 +

Updating constraints. Build id:1690466917

This update in constraints is automatically committed by the CI 
'constraints-push' step based on
HEAD of 'refs/heads/main' in 'apache/airflow'
with commit sha 83b1e364eb3949288161035e38d8a96329579e52.

All tests passed in this build so we determined we can push the updated 
constraints.

See 
https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for 
details.
---
 constraints-3.7.txt  | 2 +-
 constraints-3.8.txt  | 2 +-
 constraints-3.9.txt  | 2 +-
 constraints-no-providers-3.7.txt | 2 +-
 constraints-no-providers-3.8.txt | 2 +-
 constraints-no-providers-3.9.txt | 2 +-
 constraints-source-providers-3.7.txt | 6 +++---
 constraints-source-providers-3.8.txt | 4 ++--
 constraints-source-providers-3.9.txt | 6 +++---
 9 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/constraints-3.7.txt b/constraints-3.7.txt
index 61ef132..523dcd3 100644
--- a/constraints-3.7.txt
+++ b/constraints-3.7.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:32Z
+# This constraints file was automatically generated on 2022-01-13T01:51:09Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-3.8.txt b/constraints-3.8.txt
index 0f54717..343ac3a 100644
--- a/constraints-3.8.txt
+++ b/constraints-3.8.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:26Z
+# This constraints file was automatically generated on 2022-01-13T01:51:07Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-3.9.txt b/constraints-3.9.txt
index ef99351..ca55fdd 100644
--- a/constraints-3.9.txt
+++ b/constraints-3.9.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:26Z
+# This constraints file was automatically generated on 2022-01-13T01:51:06Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-no-providers-3.7.txt b/constraints-no-providers-3.7.txt
index db3daaa..9b28766 100644
--- a/constraints-no-providers-3.7.txt
+++ b/constraints-no-providers-3.7.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:53Z
+# This constraints file was automatically generated on 2022-01-13T01:54:24Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git a/constraints-no-providers-3.8.txt b/constraints-no-providers-3.8.txt
index ce0943e..1d4d314 100644
--- a/constraints-no-providers-3.8.txt
+++ b/constraints-no-providers-3.8.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:50Z
+# This constraints file was automatically generated on 2022-01-13T01:54:23Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git a/constraints-no-providers-3.9.txt b/constraints-no-providers-3.9.txt
index e4c012e..ce1e6e6 100644
--- a/constraints-no-providers-3.9.txt
+++ b/constraints-no-providers-3.9.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:58Z
+# This constraints file was automatically generated on 2022-01-13T01:54:25Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git 

[airflow] branch main updated (fe5aba2 -> 83b1e36)

2022-01-12 Thread dstandish
This is an automated email from the ASF dual-hosted git repository.

dstandish pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from fe5aba2  Fix grammatical error in Variable.get docstring (#20837)
 add 83b1e36  Speedup liveness probe for scheduler and triggerer (#20833)

No new revisions were added by this update.

Summary of changes:
 chart/templates/scheduler/scheduler-deployment.yaml | 1 +
 chart/templates/triggerer/triggerer-deployment.yaml | 1 +
 2 files changed, 2 insertions(+)


[GitHub] [airflow] dstandish commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


dstandish commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783602645



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob

Review comment:
   thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dstandish commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


dstandish commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783602610



##
File path: airflow/cli/cli_parser.py
##
@@ -378,6 +383,27 @@ def _check(value):
 ARG_CONF = Arg(('-c', '--conf'), help="JSON string that gets pickled into the 
DagRun's conf attribute")
 ARG_EXEC_DATE = Arg(("-e", "--exec-date"), help="The execution date of the 
DAG", type=parsedate)
 
+# maintenance
+ARG_MAINTENANCE_TABLES = Arg(
+("-t", "--tables"),
+help="Table names to perform maintenance on (use comma-separated list)",

Review comment:
   good idea




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dstandish commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


dstandish commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783602548



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob
+
+import logging
+
+from sqlalchemy import and_, func
+
+from airflow.models import TaskReschedule
+from airflow.utils import timezone
+
+now = timezone.utcnow
+
+
+objects = {
+'job': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': BaseJob,
+'recency_column': BaseJob.latest_heartbeat,
+},
+'dag': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': DagModel,
+'recency_column': DagModel.last_parsed_time,
+},
+'dag_run': {
+'keep_last': True,
+"keep_last_filters": [DagRun.external_trigger.is_(False)],
+"keep_last_group_by": DagRun.dag_id,
+'orm_model': DagRun,
+'recency_column': DagRun.execution_date,
+},
+'import_error': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': ImportError,
+'recency_column': ImportError.timestamp,
+},
+'log': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': Log,
+'recency_column': Log.dttm,
+},
+'rendered_task_instance_fields': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': RenderedTaskInstanceFields,
+'recency_column': RenderedTaskInstanceFields.execution_date,
+},
+'sla_miss': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': SlaMiss,
+'recency_column': SlaMiss.execution_date,
+},
+'task_fail': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskFail,
+'recency_column': TaskFail.execution_date,
+},
+'task_instance': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskInstance,
+'recency_column': TaskInstance.execution_date,

Review comment:
   tell me about the unsureness?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch constraints-main updated: Updating constraints. Build id:1690466917

2022-01-12 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch constraints-main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/constraints-main by this push:
 new 32bc5e2  Updating constraints. Build id:1690466917
32bc5e2 is described below

commit 32bc5e2d4a8f7b6747e4253577429664dc4e7162
Author: Automated GitHub Actions commit 
AuthorDate: Thu Jan 13 01:54:30 2022 +

Updating constraints. Build id:1690466917

This update in constraints is automatically committed by the CI 
'constraints-push' step based on
HEAD of 'refs/heads/main' in 'apache/airflow'
with commit sha 83b1e364eb3949288161035e38d8a96329579e52.

All tests passed in this build so we determined we can push the updated 
constraints.

See 
https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for 
details.
---
 constraints-3.7.txt  | 2 +-
 constraints-3.8.txt  | 2 +-
 constraints-3.9.txt  | 2 +-
 constraints-no-providers-3.7.txt | 2 +-
 constraints-no-providers-3.8.txt | 2 +-
 constraints-no-providers-3.9.txt | 2 +-
 constraints-source-providers-3.7.txt | 6 +++---
 constraints-source-providers-3.8.txt | 4 ++--
 constraints-source-providers-3.9.txt | 6 +++---
 9 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/constraints-3.7.txt b/constraints-3.7.txt
index 61ef132..523dcd3 100644
--- a/constraints-3.7.txt
+++ b/constraints-3.7.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:32Z
+# This constraints file was automatically generated on 2022-01-13T01:51:09Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-3.8.txt b/constraints-3.8.txt
index 0f54717..343ac3a 100644
--- a/constraints-3.8.txt
+++ b/constraints-3.8.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:26Z
+# This constraints file was automatically generated on 2022-01-13T01:51:07Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-3.9.txt b/constraints-3.9.txt
index ef99351..ca55fdd 100644
--- a/constraints-3.9.txt
+++ b/constraints-3.9.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:26Z
+# This constraints file was automatically generated on 2022-01-13T01:51:06Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-no-providers-3.7.txt b/constraints-no-providers-3.7.txt
index db3daaa..9b28766 100644
--- a/constraints-no-providers-3.7.txt
+++ b/constraints-no-providers-3.7.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:53Z
+# This constraints file was automatically generated on 2022-01-13T01:54:24Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git a/constraints-no-providers-3.8.txt b/constraints-no-providers-3.8.txt
index ce0943e..1d4d314 100644
--- a/constraints-no-providers-3.8.txt
+++ b/constraints-no-providers-3.8.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:50Z
+# This constraints file was automatically generated on 2022-01-13T01:54:23Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git a/constraints-no-providers-3.9.txt b/constraints-no-providers-3.9.txt
index e4c012e..ce1e6e6 100644
--- a/constraints-no-providers-3.9.txt
+++ b/constraints-no-providers-3.9.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:58Z
+# This constraints file was automatically generated on 2022-01-13T01:54:25Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git 

[airflow] branch main updated (fe5aba2 -> 83b1e36)

2022-01-12 Thread dstandish
This is an automated email from the ASF dual-hosted git repository.

dstandish pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from fe5aba2  Fix grammatical error in Variable.get docstring (#20837)
 add 83b1e36  Speedup liveness probe for scheduler and triggerer (#20833)

No new revisions were added by this update.

Summary of changes:
 chart/templates/scheduler/scheduler-deployment.yaml | 1 +
 chart/templates/triggerer/triggerer-deployment.yaml | 1 +
 2 files changed, 2 insertions(+)


[airflow] branch constraints-main updated: Updating constraints. Build id:1690466917

2022-01-12 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch constraints-main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/constraints-main by this push:
 new 32bc5e2  Updating constraints. Build id:1690466917
32bc5e2 is described below

commit 32bc5e2d4a8f7b6747e4253577429664dc4e7162
Author: Automated GitHub Actions commit 
AuthorDate: Thu Jan 13 01:54:30 2022 +

Updating constraints. Build id:1690466917

This update in constraints is automatically committed by the CI 
'constraints-push' step based on
HEAD of 'refs/heads/main' in 'apache/airflow'
with commit sha 83b1e364eb3949288161035e38d8a96329579e52.

All tests passed in this build so we determined we can push the updated 
constraints.

See 
https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for 
details.
---
 constraints-3.7.txt  | 2 +-
 constraints-3.8.txt  | 2 +-
 constraints-3.9.txt  | 2 +-
 constraints-no-providers-3.7.txt | 2 +-
 constraints-no-providers-3.8.txt | 2 +-
 constraints-no-providers-3.9.txt | 2 +-
 constraints-source-providers-3.7.txt | 6 +++---
 constraints-source-providers-3.8.txt | 4 ++--
 constraints-source-providers-3.9.txt | 6 +++---
 9 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/constraints-3.7.txt b/constraints-3.7.txt
index 61ef132..523dcd3 100644
--- a/constraints-3.7.txt
+++ b/constraints-3.7.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:32Z
+# This constraints file was automatically generated on 2022-01-13T01:51:09Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-3.8.txt b/constraints-3.8.txt
index 0f54717..343ac3a 100644
--- a/constraints-3.8.txt
+++ b/constraints-3.8.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:26Z
+# This constraints file was automatically generated on 2022-01-13T01:51:07Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-3.9.txt b/constraints-3.9.txt
index ef99351..ca55fdd 100644
--- a/constraints-3.9.txt
+++ b/constraints-3.9.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:29:26Z
+# This constraints file was automatically generated on 2022-01-13T01:51:06Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
diff --git a/constraints-no-providers-3.7.txt b/constraints-no-providers-3.7.txt
index db3daaa..9b28766 100644
--- a/constraints-no-providers-3.7.txt
+++ b/constraints-no-providers-3.7.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:53Z
+# This constraints file was automatically generated on 2022-01-13T01:54:24Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git a/constraints-no-providers-3.8.txt b/constraints-no-providers-3.8.txt
index ce0943e..1d4d314 100644
--- a/constraints-no-providers-3.8.txt
+++ b/constraints-no-providers-3.8.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:50Z
+# This constraints file was automatically generated on 2022-01-13T01:54:23Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git a/constraints-no-providers-3.9.txt b/constraints-no-providers-3.9.txt
index e4c012e..ce1e6e6 100644
--- a/constraints-no-providers-3.9.txt
+++ b/constraints-no-providers-3.9.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2022-01-12T22:33:58Z
+# This constraints file was automatically generated on 2022-01-13T01:54:25Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install just the 'bare' 'apache-airflow' package 
build from the HEAD of
 # the branch, without installing any of the providers.
diff --git 

[airflow] branch main updated (fe5aba2 -> 83b1e36)

2022-01-12 Thread dstandish
This is an automated email from the ASF dual-hosted git repository.

dstandish pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from fe5aba2  Fix grammatical error in Variable.get docstring (#20837)
 add 83b1e36  Speedup liveness probe for scheduler and triggerer (#20833)

No new revisions were added by this update.

Summary of changes:
 chart/templates/scheduler/scheduler-deployment.yaml | 1 +
 chart/templates/triggerer/triggerer-deployment.yaml | 1 +
 2 files changed, 2 insertions(+)


[GitHub] [airflow] dwiajik edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

2022-01-12 Thread GitBox


dwiajik edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dwiajik commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

2022-01-12 Thread GitBox


dwiajik commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020


   I am experiencing the same problem and have set 
`schedule_after_task_execution` to `False`. The issue still persist. Do you 
have any suggestion? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] jedcunningham opened a new pull request #18575: Resync DAG during next parse if error in ``sync_to_db``

2022-01-12 Thread GitBox


jedcunningham opened a new pull request #18575:
URL: https://github.com/apache/airflow/pull/18575


   If there is an issue syncing DAG specific permissions, force syncing to
   happen again the next time the DAG is parsed so errors will be added to
   ``import_errors`` and shown in the UI.
   
   The easiest example of this is the use of a non-existent role in
   ``access_control``.
   
   Related #17166


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch main updated (fe5aba2 -> 83b1e36)

2022-01-12 Thread dstandish
This is an automated email from the ASF dual-hosted git repository.

dstandish pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from fe5aba2  Fix grammatical error in Variable.get docstring (#20837)
 add 83b1e36  Speedup liveness probe for scheduler and triggerer (#20833)

No new revisions were added by this update.

Summary of changes:
 chart/templates/scheduler/scheduler-deployment.yaml | 1 +
 chart/templates/triggerer/triggerer-deployment.yaml | 1 +
 2 files changed, 2 insertions(+)


[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes edited a comment on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448


   @mik-laj Not sure I am following. If I try installing airflow with the 
snowflake provider using the command below:
   
   ```
   pip install apache-airflow[snowflake]==2.2.3 -c 
https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt
   ```
   
   It still results in the known issues with SqlAlchemy imports. Am I missing 
something? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes edited a comment on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448


   @mik-laj Not sure I am following. If I try installing airflow with the 
snowflake provider using the command below:
   
   ```
   pip install `apache-airflow[snowflake]==2.2.3` -c 
`https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt`
 
   ```
   
   It still results in the known issues with SqlAlchemy imports. Am I missing 
something? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes edited a comment on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448


   @mik-laj Not sure I am following. If I try installing airflow with the 
snowflake provider using the command below:
   
   pip install `apache-airflow[snowflake]==2.2.3` -c 
`https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt`
 
   
   It still results in the known issues with SqlAlchemy imports. Am I missing 
something? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes edited a comment on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448


   @mik-laj Not sure I am following. If I try installing airflow with the 
snowflake provider using the command below:
   
   pip install `apache-airflow[snowflake]` -c 
`https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-3.7.txt`
 
   
   It still results in the known issues with SqlAlchemy imports. Am I missing 
something? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes edited a comment on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448


   @mik-laj Not sure I am following. If I try installing airflow with the 
snowflake provider using the command below:
   
   pip install `apache-airflow[snowflake]` -c 
`https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt`
 
   
   It still results in the known issues with SqlAlchemy imports. Am I missing 
something? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes commented on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011631448


   @mik-laj Not sure I am following. If I try installing 
   
   pip install `apache-airflow[snowflake]` -c 
`https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt`
 it will cause the known issues with SqlAlchemy imports. Am I missing 
something? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sergxm2 edited a comment on issue #19222: none_failed_min_one_success trigger rule not working with BranchPythonOperator in certain cases.

2022-01-12 Thread GitBox


sergxm2 edited a comment on issue #19222:
URL: https://github.com/apache/airflow/issues/19222#issuecomment-1011610462


   Seeing the same issue with BranchPythonOperator / branching and the final 
task (i.e. task6) being incorrectly skipped instead of being called. This is 
observed in 2.2.x but not in 2.1.x and not in 2.0.x
   To be specific, this is unrelated to returning an "empty" task ID, as we're 
seeing this happen even when the task ID is returned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on issue #20839: Cannot edit custom fields on provider connections

2022-01-12 Thread GitBox


boring-cyborg[bot] commented on issue #20839:
URL: https://github.com/apache/airflow/issues/20839#issuecomment-1011613591


   Thanks for opening your first issue here! Be sure to follow the issue 
template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mike-mcdonald opened a new issue #20839: Cannot edit custom fields on provider connections

2022-01-12 Thread GitBox


mike-mcdonald opened a new issue #20839:
URL: https://github.com/apache/airflow/issues/20839


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   Connections from providers are not saving edited values in any custom 
connection forms. You can work around the issue by changing connection type to 
something like HTTP, and modifying the extra field's JSON.
   
   ### What you expected to happen
   
   _No response_
   
   ### How to reproduce
   
   Using the official docker compose deployment, add a new connection of type 
'Azure Data Explorer' and fill out one of the custom connection fields (e.g., 
"Tenant ID"). Save the record. Edit the record and enter a new value for the 
same field or any other field that is defined in the associated Hook's 
`get_connection_form_widgets` function. Save the record. Edit again. The 
changes were not saved.
   
   ### Operating System
   
   Windows 10, using docker
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sergxm2 commented on issue #19222: none_failed_min_one_success trigger rule not working with BranchPythonOperator in certain cases.

2022-01-12 Thread GitBox


sergxm2 commented on issue #19222:
URL: https://github.com/apache/airflow/issues/19222#issuecomment-1011610462


   Seeing the same issue with BranchPythonOperator / branching and the final 
task (i.e. task6) being incorrectly skipped instead of being called. This is 
observed in 2.2.x but not in 2.1.x and not in 2.0.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dstandish merged pull request #20833: Speedup liveness probe for scheduler and triggerer

2022-01-12 Thread GitBox


dstandish merged pull request #20833:
URL: https://github.com/apache/airflow/pull/20833


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


mik-laj commented on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011599004


   @NadimYounes This release has been yanked, so it shouldn't be installed 
automatically anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] danmactough commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2022-01-12 Thread GitBox


danmactough commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-1011598836


   Airflow 2.0.2+e494306fb01f3a026e7e2832ca94902e96b526fa (MWAA on AWS)
   
   This happens to us a LOT: a DAG will be running, task instances will be 
marked as "queued", but nothing gets moved to "running".
   
   When this happened today (the first time today), I was able to track down 
the following error in the scheduler logs:
   
   ![2022-01-12 at 7 16 
PM](https://user-images.githubusercontent.com/357481/149243393-0f0b5b91-d1f7-4a51-8a43-3eab644a49e7.png)
   
   At some point after the scheduler had that exception, I tried to clear the 
state of the queued task instances to get them to run. That resulting in the 
following logs:
   
   ![2022-01-12 at 7 18 
PM](https://user-images.githubusercontent.com/357481/149243535-3ebfd0b1-31af-43aa-99e2-7ee5aa1dbaff.png)
   
   This corresponds to this [section of 
code](https://github.com/apache/airflow/blob/2.0.2/airflow/executors/base_executor.py#L73-L85):
   
   ![2022-01-12 at 10 38 
AM](https://user-images.githubusercontent.com/357481/149171972-e9824366-6e85-4c2e-a00c-5ee66d466de8.png)
   
   My conclusion is that when the scheduler experienced that error, it entered 
a pathological state: it was running but had bad state in memory. Specifically, 
the queued task instances were in the `queued_tasks` or `running` in-memory 
cache, and thus any attempts to re-queue those tasks would fail as long as that 
scheduler process was running because the tasks would appear to already have 
been queued and/or running.
   
   Both caches use the 
[`TaskInstanceKey`](https://github.com/apache/airflow/blob/10023fdd65fa78033e7125d3d8103b63c127056e/airflow/models/taskinstance.py#L224-L230),
 which is made up of `dag_id` (which we can't change), `task_id` (which we 
can't change), `execution_date` (nope, can't change), and `try_number` ( we 
can change this!!).
   
   So to work around this, I created a utility DAG that will find all task 
instances in a "queued" or "None" state and increment the `try_number` field.
   
   The DAG runs as a single `PythonOperator` with the following callable:
   
   ```python
   @provide_session
   def unstick_dag_callable(dag_run, session, **kwargs):
   dag_id = dag_run.conf.get("dag_id")
   if not dag_id:
   raise AssertionError("dag_id was not provided")
   execution_date = dag_run.conf.get("execution_date")
   if not execution_date:
   raise AssertionError("execution_date was not provided")
   execution_date = parse(execution_date)
   
   filter = [
   or_(TaskInstance.state == State.QUEUED, TaskInstance.state == 
State.NONE),
   TaskInstance.dag_id == dag_id,
   TaskInstance.execution_date == execution_date,
   ]
   print(
   (
   f"DAG id: {dag_id}, Execution Date: {execution_date}, State: "
   f"""{dag_run.conf.get("state", f"{State.QUEUED} or 
{State.NONE}")}, """
   f"Filter: {[str(f) for f in filter]}"
   )
   )
   
   tis = session.query(TaskInstance).filter(*filter).all()
   dr = (
   session.query(DagRun)
   .filter(DagRun.dag_id == dag_id, DagRun.execution_date == 
execution_date)
   .first()
   )
   dagrun = (
   dict(
   id=dr.id,
   dag_id=dr.dag_id,
   execution_date=dr.execution_date,
   start_date=dr.start_date,
   end_date=dr.end_date,
   _state=dr._state,
   run_id=dr.run_id,
   creating_job_id=dr.creating_job_id,
   external_trigger=dr.external_trigger,
   run_type=dr.run_type,
   conf=dr.conf,
   last_scheduling_decision=dr.last_scheduling_decision,
   dag_hash=dr.dag_hash,
   )
   if dr
   else {}
   )
   
   print(f"Updating {len(tis)} task instances")
   print("Here are the task instances we're going to update")
   # Print no more than 100 tis so we don't lock up the session too long
   for ti in tis[:100]:
   pprint(
   dict(
   task_id=ti.task_id,
   job_id=ti.job_id,
   key=ti.key,
   dag_id=ti.dag_id,
   execution_date=ti.execution_date,
   state=ti.state,
   dag_run={**dagrun},
   )
   )
   if len(tis) > 100:
   print("Output truncated after 100 task instances")
   
   for ti in tis:
   ti.try_number = ti.next_try_number
   ti.state = State.NONE
   session.merge(ti)
   
   if dag_run.conf.get("activate_dag_runs", True):
   dr.state = State.RUNNING
   dr.start_date = timezone.utcnow()
   
   print("Done")
   ```
   
   Moments after I shipped this DAG, another DAG got stuck, and I had a chance 
to see if 

[GitHub] [airflow] NadimYounes edited a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes edited a comment on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011580548


   @potiuk @mik-laj Can we also fix the constraints for airflow `2.2.3`? It 
looks like `snowflake-sqlalchemy` is set to `1.2.5` 
[here](https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt#L485).
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes commented on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011580548


   @potiuk @mik-laj Can we also fix the constraints for airflow `2.2.3`? It 
looks like `snowflake-sqlalchemy` is still set to `1.2.5` 
[here](https://github.com/apache/airflow/blob/62d490d4da17e35d4ddcd4ee38902a8a4e9bbfff/constraints-3.7.txt#L485).
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] SamWheating commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


SamWheating commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783539524



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob
+
+import logging
+
+from sqlalchemy import and_, func
+
+from airflow.models import TaskReschedule
+from airflow.utils import timezone
+
+now = timezone.utcnow
+
+
+objects = {
+'job': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': BaseJob,
+'recency_column': BaseJob.latest_heartbeat,
+},
+'dag': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': DagModel,
+'recency_column': DagModel.last_parsed_time,
+},
+'dag_run': {
+'keep_last': True,
+"keep_last_filters": [DagRun.external_trigger.is_(False)],
+"keep_last_group_by": DagRun.dag_id,
+'orm_model': DagRun,
+'recency_column': DagRun.execution_date,
+},
+'import_error': {

Review comment:
   Is there any reason to expire old Import Errors?
   
   They get removed constantly by the File Processor as files are re-processed:
   
https://github.com/apache/airflow/blob/3ccb79423e8966305bb762200b53134dd2b349ec/airflow/dag_processing/processor.py#L539-L543
   
   And they get removed when the related file no longer exists:
   
https://github.com/apache/airflow/blob/3ccb79423e8966305bb762200b53134dd2b349ec/airflow/dag_processing/processor.py#L539-L543
   
   So I wouldn't expect there to be any dangling records here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] closed pull request #18575: Resync DAG during next parse if error in ``sync_to_db``

2022-01-12 Thread GitBox


github-actions[bot] closed pull request #18575:
URL: https://github.com/apache/airflow/pull/18575


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes removed a comment on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes removed a comment on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011561961


   https://github.com/apache/airflow/pull/20245#issuecomment-992387108
   
   Can we please fix the constraints for 2.2.3? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] NadimYounes commented on pull request #20245: Exclude snowflake-sqlalchemy v1.2.5

2022-01-12 Thread GitBox


NadimYounes commented on pull request #20245:
URL: https://github.com/apache/airflow/pull/20245#issuecomment-1011561961


   https://github.com/apache/airflow/pull/20245#issuecomment-992387108
   
   Can we please fix the constraints for 2.2.3? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


mik-laj commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783528233



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob
+
+import logging
+
+from sqlalchemy import and_, func
+
+from airflow.models import TaskReschedule
+from airflow.utils import timezone
+
+now = timezone.utcnow
+
+
+objects = {
+'job': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': BaseJob,
+'recency_column': BaseJob.latest_heartbeat,
+},
+'dag': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': DagModel,
+'recency_column': DagModel.last_parsed_time,
+},
+'dag_run': {
+'keep_last': True,
+"keep_last_filters": [DagRun.external_trigger.is_(False)],
+"keep_last_group_by": DagRun.dag_id,
+'orm_model': DagRun,
+'recency_column': DagRun.execution_date,
+},
+'import_error': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': ImportError,
+'recency_column': ImportError.timestamp,
+},
+'log': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': Log,
+'recency_column': Log.dttm,
+},
+'rendered_task_instance_fields': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': RenderedTaskInstanceFields,
+'recency_column': RenderedTaskInstanceFields.execution_date,
+},
+'sla_miss': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': SlaMiss,
+'recency_column': SlaMiss.execution_date,
+},
+'task_fail': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskFail,
+'recency_column': TaskFail.execution_date,
+},
+'task_instance': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskInstance,
+'recency_column': TaskInstance.execution_date,
+},
+'task_reschedule': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskReschedule,
+'recency_column': TaskReschedule.execution_date,
+},
+'xcom': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': XCom,
+'recency_column': XCom.execution_date,
+},
+}
+
+
+airflow_executor = str(conf.get("core", "executor"))
+if airflow_executor == "CeleryExecutor":
+from celery.backends.database.models import Task, TaskSet
+
+print("Including Celery Modules")
+try:
+objects.update(
+**{
+'task': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': Task,
+'recency_column': Task.date_done,
+},
+'task_set': {
+'keep_last': False,
+'keep_last_filters': 

[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


mik-laj commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783528077



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob
+
+import logging
+
+from sqlalchemy import and_, func
+
+from airflow.models import TaskReschedule
+from airflow.utils import timezone
+
+now = timezone.utcnow
+
+
+objects = {
+'job': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': BaseJob,
+'recency_column': BaseJob.latest_heartbeat,
+},
+'dag': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': DagModel,
+'recency_column': DagModel.last_parsed_time,
+},
+'dag_run': {
+'keep_last': True,
+"keep_last_filters": [DagRun.external_trigger.is_(False)],
+"keep_last_group_by": DagRun.dag_id,
+'orm_model': DagRun,
+'recency_column': DagRun.execution_date,
+},
+'import_error': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': ImportError,
+'recency_column': ImportError.timestamp,
+},
+'log': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': Log,
+'recency_column': Log.dttm,
+},
+'rendered_task_instance_fields': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': RenderedTaskInstanceFields,
+'recency_column': RenderedTaskInstanceFields.execution_date,
+},
+'sla_miss': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': SlaMiss,
+'recency_column': SlaMiss.execution_date,
+},
+'task_fail': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskFail,
+'recency_column': TaskFail.execution_date,
+},
+'task_instance': {
+'keep_last': False,
+'keep_last_filters': None,
+'keep_last_group_by': None,
+'orm_model': TaskInstance,
+'recency_column': TaskInstance.execution_date,

Review comment:
   I'm not sure about it. It is a association proxy




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


mik-laj commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783523781



##
File path: airflow/cli/cli_parser.py
##
@@ -1054,6 +1080,14 @@ class GroupCommand(NamedTuple):
 args=(ARG_CLEAR_ONLY,),
 ),
 )
+MAINTENANCE_COMMANDS = (
+ActionCommand(
+name='cleanup',

Review comment:
   In the future, we can add other commands that will clean other things, 
e.g. logs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


mik-laj commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783523446



##
File path: airflow/cli/cli_parser.py
##
@@ -1054,6 +1080,14 @@ class GroupCommand(NamedTuple):
 args=(ARG_CLEAR_ONLY,),
 ),
 )
+MAINTENANCE_COMMANDS = (
+ActionCommand(
+name='cleanup',

Review comment:
   ```suggestion
   name='cleanup-tables',
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


mik-laj commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783520216



##
File path: airflow/cli/cli_parser.py
##
@@ -378,6 +383,27 @@ def _check(value):
 ARG_CONF = Arg(('-c', '--conf'), help="JSON string that gets pickled into the 
DagRun's conf attribute")
 ARG_EXEC_DATE = Arg(("-e", "--exec-date"), help="The execution date of the 
DAG", type=parsedate)
 
+# maintenance
+ARG_MAINTENANCE_TABLES = Arg(
+("-t", "--tables"),
+help="Table names to perform maintenance on (use comma-separated list)",

Review comment:
   Can you add a list of accepted values?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


mik-laj commented on a change in pull request #20838:
URL: https://github.com/apache/airflow/pull/20838#discussion_r783519744



##
File path: airflow/utils/metastore_cleanup.py
##
@@ -0,0 +1,289 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+orm_model: the table
+recency_column: date column to filter by
+keep_last: whether the last record should be kept even if it's older than 
clean_before_timestamp
+keep_last_filters:
+keep_last_group_by: if keeping the last record, can keep the last record for 
each group
+"""
+
+
+from pendulum import DateTime
+from sqlalchemy.orm import Query
+
+from airflow import settings
+from airflow.configuration import conf
+from airflow.models import (
+DagModel,
+DagRun,
+ImportError,
+Log,
+RenderedTaskInstanceFields,
+SlaMiss,
+TaskFail,
+TaskInstance,
+XCom,
+)
+
+try:
+from airflow.jobs import BaseJob
+except Exception as e:
+from airflow.jobs.base_job import BaseJob

Review comment:
   It is not needed, because we don't maintain compatibility with Airflow 
1.10.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dstandish opened a new pull request #20838: Add `maintenance cleanup` CLI command for purging old data

2022-01-12 Thread GitBox


dstandish opened a new pull request #20838:
URL: https://github.com/apache/airflow/pull/20838


   Must supply "purge before date".
   Can optionally provide table list.
   Dry run will only print the number of rows meeting criteria.
   If not dry run, will require the user to confirm before deleting.
   
   *note* draft. and still need to add tests 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] SamWheating edited a comment on issue #20832: Unable to specify Python version for AwsGlueJobOperator

2022-01-12 Thread GitBox


SamWheating edited a comment on issue #20832:
URL: https://github.com/apache/airflow/issues/20832#issuecomment-1011511506


   Also for what its worth, I think that you `Command` block is invalid, as the 
`Command.Name` you're using (`abalone-preprocess`) must be one of `glueetl`, 
`pythonshell` or `gluestreaming`.
   
   https://docs.aws.amazon.com/glue/latest/webapi/API_JobCommand.html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] SamWheating edited a comment on issue #20832: Unable to specify Python version for AwsGlueJobOperator

2022-01-12 Thread GitBox


SamWheating edited a comment on issue #20832:
URL: https://github.com/apache/airflow/issues/20832#issuecomment-1011502821


   I think that this is because the GlueHook is pretty opinionated and 
hardcodes the value of `Command` when running the `glue_client.create_job` 
command:
   
   
https://github.com/apache/airflow/blob/2ab2ae8849bf6d80a700b1b74cef37eb187161ad/airflow/providers/amazon/aws/hooks/glue.py#L181-L225
   
   So when you provide `Command` as a `create_job_kwargs`, it ends up being 
supplied twice to that function (Although I suspect that this would be a 
typeError, not a keyError 樂)
   
   Anyways, thoughts on just making the `Command.Name` and 
`Command.PythonVersion` argument configurable in the `GlueJobOperator`?
   
   If y'all think that this is a satisfactory fix, feel free to assign this 
issue to me and I can put up a quick PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   >