[GitHub] [airflow] github-actions[bot] commented on pull request #14521: Add Asana Provider

2021-04-16 Thread GitBox


github-actions[bot] commented on pull request #14521:
URL: https://github.com/apache/airflow/pull/14521#issuecomment-821757156


   [The Workflow run](https://github.com/apache/airflow/actions/runs/757540553) 
is cancelling this PR. Building images for the PR has failed. Follow the 
workflow link to check the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on issue #15415: S3 remote logging not working for airflow server components

2021-04-16 Thread GitBox


boring-cyborg[bot] commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-821752982


   Thanks for opening your first issue here! Be sure to follow the issue 
template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kakarukeys opened a new issue #15415: S3 remote logging not working for airflow server components

2021-04-16 Thread GitBox


kakarukeys opened a new issue #15415:
URL: https://github.com/apache/airflow/issues/15415


   **Apache Airflow version**: 2.0.1
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: on my laptop
   - **OS** (e.g. from /etc/os-release): MacOS Majave 10.14.6
   - **Kernel** (e.g. `uname -a`): Darwin Wongs-MBP 18.7.0 Darwin Kernel 
Version 18.7.0: Tue Jan 12 22:04:47 PST 2021; 
root:xnu-4903.278.56~1/RELEASE_X86_64 x86_64
   
   **What happened**:
   
   configured remote logging to S3 bucket, only the logs of DAG runs appeared 
in the bucket.
   logs of airflow server components: scheduler, web server, etc did appear
   
   **What you expected to happen**:
   
   all logs go to S3 bucket
   
   **How to reproduce it**:
   
   1. follow the quick start guide in 
https://airflow.apache.org/docs/apache-airflow/stable/start/local.html
   
   2. before starting web server set the following variables:
   
   ```sh
   export AIRFLOW__LOGGING__REMOTE_LOGGING=True
   export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://my-bucket/
   export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=my_remote_logging_conn_id
   ```
   
   3. start the web server and set your S3 connection settings in the web 
server "connections" section.
   
   ```
   Conn Id * my_remote_logging_conn_id
   Conn Type  S3
   Extra {"region_name": "nyc3",
"host": "https://nyc3.digitaloceanspaces.com";,
"aws_access_key_id": "xxx",
"aws_secret_access_key": "xxx"}
   ```
   
   4. Restart the web server
   5. Start the scheduler in another console window (setting the same env 
variables)
   6. Execute a DAG
   7. Head to your S3 bucket UI, you will see only logs of DAG runs appear.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] tag nightly-master updated (d7bc217 -> 49cae1f)

2021-04-16 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to tag nightly-master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


*** WARNING: tag nightly-master was modified! ***

from d7bc217  (commit)
  to 49cae1f  (commit)
from d7bc217  Add documentation for the HTTP connection (#15379)
 add d115040  Bugfix: ``TypeError`` when Serializing & sorting iterables 
(#15395)
 add f94effe  Don't try to push the python build image when building on 
release branches (#15394)
 add e7c642b  Adds a test for the description field in variable  (#15400)
 add 54edbaa  Share app instance between Kerberos tests (#15141)
 add adbab36  Add changelog for what will become 2.0.2 (#15380)
 add cb1344b  Update azure connection documentation (#15352)
 add f878ec6  Persist tags params in pagination (#15411)
 add 49cae1f  Add documentation for Databricks connection (#15410)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/ci.yml   | 13 +++
 CHANGELOG.txt  | 97 +-
 UPDATING.md|  3 +
 airflow/providers/databricks/hooks/databricks.py   |  2 +-
 .../providers/databricks/operators/databricks.py   |  4 +-
 airflow/providers/microsoft/azure/hooks/adx.py |  3 +-
 .../providers/microsoft/azure/hooks/azure_batch.py |  4 +
 .../azure/hooks/azure_container_instance.py|  6 +-
 .../azure/hooks/azure_container_registry.py|  5 +-
 .../azure/hooks/azure_container_volume.py  |  4 +-
 .../microsoft/azure/hooks/azure_cosmos.py  |  3 +-
 .../microsoft/azure/hooks/azure_data_factory.py|  4 +-
 .../microsoft/azure/hooks/azure_data_lake.py   |  2 +-
 .../microsoft/azure/hooks/azure_fileshare.py   |  2 +-
 .../providers/microsoft/azure/hooks/base_azure.py  |  5 +-
 airflow/providers/microsoft/azure/hooks/wasb.py|  2 +-
 .../microsoft/azure/operators/adls_delete.py   |  2 +
 .../microsoft/azure/operators/adls_list.py |  3 +-
 airflow/providers/microsoft/azure/operators/adx.py |  3 +-
 .../microsoft/azure/operators/azure_batch.py   |  2 +-
 .../azure/operators/azure_container_instances.py   |  3 +-
 .../microsoft/azure/operators/azure_cosmos.py  |  3 +-
 .../microsoft/azure/operators/wasb_delete_blob.py  |  2 +-
 .../microsoft/azure/sensors/azure_cosmos.py|  3 +-
 airflow/providers/microsoft/azure/sensors/wasb.py  |  2 +-
 airflow/serialization/serialized_objects.py|  9 +-
 airflow/www/utils.py   | 21 +++--
 airflow/www/views.py   |  1 +
 .../connections/databricks.rst | 73 
 docs/apache-airflow-providers-databricks/index.rst |  1 +
 .../connections/acr.rst| 62 ++
 .../connections/adf.rst| 72 
 .../connections/adl.rst| 70 
 .../connections/adx.rst| 96 +
 .../connections/azure.rst  |  4 +-
 .../connections/azure_batch.rst| 65 +++
 .../connections/azure_cosmos.rst   | 66 +++
 .../connections/index.rst  |  0
 .../connections/wasb.rst   | 84 +++
 .../index.rst  |  2 +-
 docs/spelling_wordlist.txt |  2 +
 scripts/ci/images/ci_prepare_prod_image_on_ci.sh   | 10 ++-
 scripts/ci/libraries/_build_images.sh  |  1 -
 scripts/ci/libraries/_push_pull_remove_images.sh   |  5 +-
 tests/api/auth/backend/test_kerberos_auth.py   | 38 +
 tests/serialization/test_dag_serialization.py  | 36 ++--
 tests/www/test_utils.py|  9 +-
 tests/www/test_views.py| 14 +++-
 48 files changed, 846 insertions(+), 77 deletions(-)
 create mode 100644 
docs/apache-airflow-providers-databricks/connections/databricks.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/acr.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/adf.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/adl.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/adx.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/azure_batch.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/azure_cosmos.rst
 copy docs/{apache-airflow-providers-google => 
apache-airflow-providers-microsoft-azure}/connections/index.rst (100%)
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/wasb.rst


[GitHub] [airflow-ci-infra] hmike96 opened a new pull request #15: Feature/packer runner ami

2021-04-16 Thread GitBox


hmike96 opened a new pull request #15:
URL: https://github.com/apache/airflow-ci-infra/pull/15


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #15414: Add Traceback in LogRecord in ``JSONFormatter``

2021-04-16 Thread GitBox


kaxil commented on a change in pull request #15414:
URL: https://github.com/apache/airflow/pull/15414#discussion_r615183685



##
File path: airflow/utils/log/json_formatter.py
##
@@ -43,5 +43,16 @@ def usesTime(self):
 def format(self, record):
 super().format(record)
 record_dict = {label: getattr(record, label, None) for label in 
self.json_fields}
+if "message" in self.json_fields:
+msg = record_dict["message"]
+if record.exc_text:
+if msg[-1:] != "\n":
+msg = msg + "\n"
+msg = msg + record.exc_text
+if record.stack_info:
+if msg[-1:] != "\n":
+msg = msg + "\n"
+msg = msg + self.formatStack(record.stack_info)

Review comment:
   This is from 
https://github.com/python/cpython/blob/adf24bd835ed8f76dcc51aa98c8c54275e86965b/Lib/logging/__init__.py#L687-L694




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil opened a new pull request #15414: Add Traceback in LogRecord in ``JSONFormatter``

2021-04-16 Thread GitBox


kaxil opened a new pull request #15414:
URL: https://github.com/apache/airflow/pull/15414


   Currently traceback is not included when ``JSONFormatter`` is used.
   (`[logging] json_format = True`) . However, the default Handler
   includes the Stacktrace. To currently include the trace we need to
   add `json_fields = asctime, filename, lineno, levelname, message, exc_text`.
   
   This is a bigger problem when using Elasticsearch Logging with:
   
   ```ini
   [elasticsearch]
   write_stdout = True
   json_format = True
   json_fields = asctime, filename, lineno, levelname, message, exc_text
   
   [logging]
   log_format = [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - 
%(message)s - %(exc_text)s
   ```
   
   Running the following DAG with the above config won't show trace:
   
   ```python
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.utils.dates import days_ago
   
   with DAG(
   dag_id='example_error',
   schedule_interval=None,
   start_date=days_ago(2),
   ) as dag:
   
   def raise_error(**kwargs):
   raise Exception("I am an exception from task logs")
   
   task_1 = PythonOperator(
   task_id='task_1',
   python_callable=raise_error,
   )
   ```
   
   Before:
   
   ```
   [2021-04-17 00:11:00,152] {taskinstance.py:877} INFO - Dependencies all met 
for 
   ...
   ...
   [2021-04-17 00:11:00,298] {taskinstance.py:1482} ERROR - Task failed with 
exception
   [2021-04-17 00:11:00,300] {taskinstance.py:1532} INFO - Marking task as 
FAILED. dag_id=example_python_operator, task_id=print_the_context, 
execution_date=20210417T001057, start_date=20210417T001100, 
end_date=20210417T001100
   [2021-04-17 00:11:00,325] {local_task_job.py:146} INFO - Task exited with 
return code 1
   ```
   
   After:
   
   ```
   [2021-04-17 00:11:00,152] {taskinstance.py:877} INFO - Dependencies all met 
for 
   ...
   ...
   [2021-04-17 00:11:00,298] {taskinstance.py:1482} ERROR - Task failed with 
exception
   Traceback (most recent call last):
 File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1138, in _run_raw_task
   self._prepare_and_execute_task_with_callbacks(context, task)
 File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1311, in _prepare_and_execute_task_with_callbacks
   result = self._execute_task(context, task_copy)
 File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1341, in _execute_task
   result = task_copy.execute(context=context)
 File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", 
line 117, in execute
   return_value = self.execute_callable()
 File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", 
line 128, in execute_callable
   return self.python_callable(*self.op_args, **self.op_kwargs)
 File "/usr/local/airflow/dags/eg-2.py", line 25, in print_context
   raise Exception("I am an exception from task logs")
   Exception: I am an exception from task logs
   [2021-04-17 00:11:00,300] {taskinstance.py:1532} INFO - Marking task as 
FAILED. dag_id=example_python_operator, task_id=print_the_context, 
execution_date=20210417T001057, start_date=20210417T001100, 
end_date=20210417T001100
   [2021-04-17 00:11:00,325] {local_task_job.py:146} INFO - Task exited with 
return code 1
   ```
   
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] closed pull request #14391: flower 0.9.5 does not require rewrite on nginx

2021-04-16 Thread GitBox


github-actions[bot] closed pull request #14391:
URL: https://github.com/apache/airflow/pull/14391


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] closed pull request #13017: optional schema configuration of S3ToSnowflakeTransferOperator

2021-04-16 Thread GitBox


github-actions[bot] closed pull request #13017:
URL: https://github.com/apache/airflow/pull/13017


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] commented on pull request #12637: Remove unsupported arguments

2021-04-16 Thread GitBox


github-actions[bot] commented on pull request #12637:
URL: https://github.com/apache/airflow/pull/12637#issuecomment-821731490


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed in 5 days if no further activity occurs. 
Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch master updated (f878ec6 -> 49cae1f)

2021-04-16 Thread kaxilnaik
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from f878ec6  Persist tags params in pagination (#15411)
 add 49cae1f  Add documentation for Databricks connection (#15410)

No new revisions were added by this update.

Summary of changes:
 airflow/providers/databricks/hooks/databricks.py   |  2 +-
 .../providers/databricks/operators/databricks.py   |  4 +-
 .../connections/databricks.rst | 73 ++
 docs/apache-airflow-providers-databricks/index.rst |  1 +
 4 files changed, 77 insertions(+), 3 deletions(-)
 create mode 100644 
docs/apache-airflow-providers-databricks/connections/databricks.rst


[GitHub] [airflow] kaxil merged pull request #15410: Add documentation for Databricks connection

2021-04-16 Thread GitBox


kaxil merged pull request #15410:
URL: https://github.com/apache/airflow/pull/15410


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sosso commented on issue #14205: Scheduler "deadlocks" itself when max_active_runs_per_dag is reached by up_for_retry tasks

2021-04-16 Thread GitBox


sosso commented on issue #14205:
URL: https://github.com/apache/airflow/issues/14205#issuecomment-821723114


   After a few hours spelunking through the codebase, we came across 
`max_dagruns_per_loop_to_schedule` in `scheduler_job.py`.  We have several 
thousand DAGs, all between 1-3 tasks (80% are probably 1 task) and this was a 
massive improvement for us in setting it to 200 (10x the default value of 20).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] rcramblit edited a comment on issue #8421: Hide sensitive data in UI

2021-04-16 Thread GitBox


rcramblit edited a comment on issue #8421:
URL: https://github.com/apache/airflow/issues/8421#issuecomment-821550334


   I have a similar need that might not be covered by the solutions posed in 
this thread:
   I am using the DockerOperator in Airflow 1.10.15 and want to pass some 
secrets to the docker container via DockerOperator's `environment` argument, 
but this argument is rendered in the AirflowUI in the Task Instance Details 
view. Is there any way to control what arguments are rendered there?
   
   edit: I see there is a private_environment in the [DockerOperator plugin for 
2.x](https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/_api/airflow/providers/docker/operators/docker/index.html),
 so I am copying that functionality by overriding the `_run_image` method in a 
sublcass - in case anyone else happens along this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] rcramblit edited a comment on issue #8421: Hide sensitive data in UI

2021-04-16 Thread GitBox


rcramblit edited a comment on issue #8421:
URL: https://github.com/apache/airflow/issues/8421#issuecomment-821550334


   I have a similar need that might not be covered by the solutions posed in 
this thread:
   I am using the DockerOperator in Airflow 1.10.15 and want to pass some 
secrets to the docker container via DockerOperator's `environment` argument, 
but this argument is rendered in the AirflowUI in the Task Instance Details 
view. Is there any way to control what arguments are rendered there?
   
   edit: I see there is a private_environment in the DockerOperator plugin for 
2.x, so I am copying that functionality by overriding the `_run_image` method 
in a sublcass - in case anyone else happens along this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] charles2588 opened a new issue #15413: Unable to Open Instance Details Page

2021-04-16 Thread GitBox


charles2588 opened a new issue #15413:
URL: https://github.com/apache/airflow/issues/15413


   **Apache Airflow version**: 2.0
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): N/A
   
   **Environment**: 
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release): 
   ```
   NAME="Ubuntu"
   VERSION="20.04.1 LTS (Focal Fossa)"
   ID=ubuntu
   ID_LIKE=debian
   PRETTY_NAME="Ubuntu 20.04.1 LTS"
   VERSION_ID="20.04
   ```
   - **Kernel** (e.g. `uname -a`):  `Linux airflowvm 5.4.0-70-generic 
#78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux`
   - **Install tools**: 
   - **Others**:
   
   **What happened**:
   
 The instance details page throws error:-
   
http://localhost:8080/task?dag_id=xcom_dag&task_id=downloading_data&execution_date=2021-04-16T23%3A01%3A08.044070%2B00%3A00
   
   
![image](https://user-images.githubusercontent.com/5734421/115091805-71d1ef00-9ecd-11eb-9ab1-839d5e8a7e51.png)
   
   
   `Something bad has happened.
   Please consider letting us know by creating a bug report using GitHub.
   
   Python version: 3.8.5
   Airflow version: 2.0.0
   Node: airflowvm
   
---
   Traceback (most recent call last):
 File "/home/airflow/sandbox/lib/python3.8/site-packages/flask/app.py", 
line 2447, in wsgi_app
   response = self.full_dispatch_request()
 File "/home/airflow/sandbox/lib/python3.8/site-packages/flask/app.py", 
line 1952, in full_dispatch_request
   rv = self.handle_user_exception(e)
 File "/home/airflow/sandbox/lib/python3.8/site-packages/flask/app.py", 
line 1821, in handle_user_exception
   reraise(exc_type, exc_value, tb)
 File "/home/airflow/sandbox/lib/python3.8/site-packages/flask/_compat.py", 
line 39, in reraise
   raise value
 File "/home/airflow/sandbox/lib/python3.8/site-packages/flask/app.py", 
line 1950, in full_dispatch_request
   rv = self.dispatch_request()
 File "/home/airflow/sandbox/lib/python3.8/site-packages/flask/app.py", 
line 1936, in dispatch_request
   return self.view_functions[rule.endpoint](**req.view_args)
 File 
"/home/airflow/sandbox/lib/python3.8/site-packages/airflow/www/auth.py", line 
34, in decorated
   return func(*args, **kwargs)
 File 
"/home/airflow/sandbox/lib/python3.8/site-packages/airflow/www/decorators.py", 
line 60, in wrapper
   return f(*args, **kwargs)
 File 
"/home/airflow/sandbox/lib/python3.8/site-packages/airflow/www/views.py", line 
1193, in task
   attr = getattr(ti, attr_name)
 File 
"/home/airflow/sandbox/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 792, in previous_start_date_success
   return self.get_previous_start_date(state=State.SUCCESS)
 File 
"/home/airflow/sandbox/lib/python3.8/site-packages/airflow/utils/session.py", 
line 65, in wrapper
   return func(*args, session=session, **kwargs)
 File 
"/home/airflow/sandbox/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 776, in get_previous_start_date
   return prev_ti and pendulum.instance(prev_ti.start_date)
 File 
"/home/airflow/sandbox/lib/python3.8/site-packages/pendulum/__init__.py", line 
174, in instance
   raise ValueError("instance() only accepts datetime objects.")
   ValueError: instance() only accepts datetime objects.`
   
   **What you expected to happen**:
   
   
   
   **How to reproduce it**:
   
   1. Go to Airflow UI and open any  dag you may have
   2. Click on any task node in Graph view
   3. Click on Instance Details button
   
   
![image](https://user-images.githubusercontent.com/5734421/115092500-15bb9a80-9ece-11eb-8e31-3182c8300fe4.png)
   
   
![image](https://user-images.githubusercontent.com/5734421/115092769-279d3d80-9ece-11eb-90e8-bed3794826d5.png)
   
   
   
![image](https://user-images.githubusercontent.com/5734421/115093135-43084880-9ece-11eb-8461-0a8ede567eac.png)
   
   Note the Host operating system is windows 10 and Chrome 85
   
   
   **Anything else we need to know**:
   
   The issue is consistently reproducible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on issue #15413: Unable to Open Instance Details Page

2021-04-16 Thread GitBox


boring-cyborg[bot] commented on issue #15413:
URL: https://github.com/apache/airflow/issues/15413#issuecomment-821717091


   Thanks for opening your first issue here! Be sure to follow the issue 
template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch constraints-master updated: Updating constraints. Build id:756975687

2021-04-16 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch constraints-master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/constraints-master by this 
push:
 new a87acdc  Updating constraints. Build id:756975687
a87acdc is described below

commit a87acdcb3f513c6015ecbb8e3eeb2bb11b6b1ee6
Author: Automated GitHub Actions commit 
AuthorDate: Fri Apr 16 22:43:10 2021 +

Updating constraints. Build id:756975687

This update in constraints is automatically committed by the CI 
'constraints-push' step based on
HEAD of 'refs/heads/master' in 'apache/airflow'
with commit sha f878ec6c599a089a6d7516b7a66eed693f0c9037.

All tests passed in this build so we determined we can push the updated 
constraints.

See 
https://github.com/apache/airflow/blob/master/README.md#installing-from-pypi 
for details.
---
 constraints-3.6.txt  | 4 ++--
 constraints-3.7.txt  | 4 ++--
 constraints-3.8.txt  | 4 ++--
 constraints-no-providers-3.7.txt | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/constraints-3.6.txt b/constraints-3.6.txt
index 5eba328..2550422 100644
--- a/constraints-3.6.txt
+++ b/constraints-3.6.txt
@@ -200,7 +200,7 @@ entrypoints==0.3
 eventlet==0.30.2
 execnet==1.8.0
 facebook-business==10.0.0
-fastavro==1.3.5
+fastavro==1.4.0
 fasteners==0.16
 filelock==3.0.12
 fissix==20.8.0
@@ -209,7 +209,7 @@ flake8==3.9.0
 flaky==3.7.0
 flower==0.9.7
 freezegun==1.1.0
-fsspec==0.9.0
+fsspec==2021.4.0
 future==0.18.2
 gcsfs==0.8.0
 gevent==21.1.2
diff --git a/constraints-3.7.txt b/constraints-3.7.txt
index e945b40..3a6bd65 100644
--- a/constraints-3.7.txt
+++ b/constraints-3.7.txt
@@ -198,7 +198,7 @@ entrypoints==0.3
 eventlet==0.30.2
 execnet==1.8.0
 facebook-business==10.0.0
-fastavro==1.3.5
+fastavro==1.4.0
 fasteners==0.16
 filelock==3.0.12
 fissix==20.8.0
@@ -207,7 +207,7 @@ flake8==3.9.0
 flaky==3.7.0
 flower==0.9.7
 freezegun==1.1.0
-fsspec==0.9.0
+fsspec==2021.4.0
 future==0.18.2
 gcsfs==0.8.0
 gevent==21.1.2
diff --git a/constraints-3.8.txt b/constraints-3.8.txt
index fb4d592..5d6575d 100644
--- a/constraints-3.8.txt
+++ b/constraints-3.8.txt
@@ -197,7 +197,7 @@ entrypoints==0.3
 eventlet==0.30.2
 execnet==1.8.0
 facebook-business==10.0.0
-fastavro==1.3.5
+fastavro==1.4.0
 fasteners==0.16
 filelock==3.0.12
 fissix==20.8.0
@@ -206,7 +206,7 @@ flake8==3.9.0
 flaky==3.7.0
 flower==0.9.7
 freezegun==1.1.0
-fsspec==0.9.0
+fsspec==2021.4.0
 future==0.18.2
 gcsfs==0.8.0
 gevent==21.1.2
diff --git a/constraints-no-providers-3.7.txt b/constraints-no-providers-3.7.txt
index b6f8f6f..202aac8 100644
--- a/constraints-no-providers-3.7.txt
+++ b/constraints-no-providers-3.7.txt
@@ -61,7 +61,7 @@ email-validator==1.1.2
 eventlet==0.30.2
 filelock==3.0.12
 flower==0.9.7
-fsspec==0.9.0
+fsspec==2021.4.0
 gevent==21.1.2
 google-auth==1.29.0
 graphviz==0.16


[airflow] branch master updated: Persist tags params in pagination (#15411)

2021-04-16 Thread ryanahamilton
This is an automated email from the ASF dual-hosted git repository.

ryanahamilton pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
 new f878ec6  Persist tags params in pagination (#15411)
f878ec6 is described below

commit f878ec6c599a089a6d7516b7a66eed693f0c9037
Author: Ryan Hamilton 
AuthorDate: Fri Apr 16 17:34:10 2021 -0400

Persist tags params in pagination (#15411)
---
 airflow/www/utils.py| 21 +
 airflow/www/views.py|  1 +
 tests/www/test_utils.py |  9 +++--
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/airflow/www/utils.py b/airflow/www/utils.py
index af964dc..af34536 100644
--- a/airflow/www/utils.py
+++ b/airflow/www/utils.py
@@ -69,10 +69,10 @@ def should_hide_value_for_key(key_name):
 
 def get_params(**kwargs):
 """Return URL-encoded params"""
-return urlencode({d: v for d, v in kwargs.items() if v is not None})
+return urlencode({d: v for d, v in kwargs.items() if v is not None}, True)
 
 
-def generate_pages(current_page, num_of_pages, search=None, status=None, 
window=7):
+def generate_pages(current_page, num_of_pages, search=None, status=None, 
tags=None, window=7):
 """
 Generates the HTML for a paging component using a similar logic to the 
paging
 auto-generated by Flask managed views. The paging component defines a 
number of
@@ -81,7 +81,7 @@ def generate_pages(current_page, num_of_pages, search=None, 
status=None, window=
 current one in the middle of the pager component. When in the last pages,
 the pages won't scroll and just keep moving until the last page. Pager 
also contains
  pages.
-This component takes into account custom parameters such as search and 
status,
+This component takes into account custom parameters such as search, 
status, and tags
 which could be added to the pages link in order to maintain the state 
between
 client and server. It also allows to make a bookmark on a specific paging 
state.
 
@@ -89,6 +89,7 @@ def generate_pages(current_page, num_of_pages, search=None, 
status=None, window=
 :param num_of_pages: the total number of pages
 :param search: the search query string, if any
 :param status: 'all', 'active', or 'paused'
+:param tags: array of strings of the current filtered tags
 :param window: the number of pages to be shown in the paging component (7 
default)
 :return: the HTML string of the paging component
 """
@@ -127,7 +128,9 @@ def generate_pages(current_page, num_of_pages, search=None, 
status=None, window=
 
 is_disabled = 'disabled' if current_page <= 0 else ''
 
-first_node_link = void_link if is_disabled else f'?{get_params(page=0, 
search=search, status=status)}'
+first_node_link = (
+void_link if is_disabled else f'?{get_params(page=0, search=search, 
status=status, tags=tags)}'
+)
 output.append(
 first_node.format(
 href_link=first_node_link,
@@ -137,7 +140,7 @@ def generate_pages(current_page, num_of_pages, search=None, 
status=None, window=
 
 page_link = void_link
 if current_page > 0:
-page_link = f'?{get_params(page=current_page - 1, search=search, 
status=status)}'
+page_link = f'?{get_params(page=current_page - 1, search=search, 
status=status, tags=tags)}'
 
 output.append(previous_node.format(href_link=page_link, 
disabled=is_disabled))  # noqa
 
@@ -159,7 +162,7 @@ def generate_pages(current_page, num_of_pages, search=None, 
status=None, window=
 'is_active': 'active' if is_current(current_page, page) else '',
 'href_link': void_link
 if is_current(current_page, page)
-else f'?{get_params(page=page, search=search, status=status)}',
+else f'?{get_params(page=page, search=search, status=status, 
tags=tags)}',
 'page_num': page + 1,
 }
 output.append(page_node.format(**vals))  # noqa
@@ -169,13 +172,15 @@ def generate_pages(current_page, num_of_pages, 
search=None, status=None, window=
 page_link = (
 void_link
 if current_page >= num_of_pages - 1
-else f'?{get_params(page=current_page + 1, search=search, 
status=status)}'
+else f'?{get_params(page=current_page + 1, search=search, 
status=status, tags=tags)}'
 )
 
 output.append(next_node.format(href_link=page_link, disabled=is_disabled)) 
 # noqa
 
 last_node_link = (
-void_link if is_disabled else f'?{get_params(page=last_page, 
search=search, status=status)}'
+void_link
+if is_disabled
+else f'?{get_params(page=last_page, search=search, status=status, 
tags=tags)}'
 )
 output.append(
 last_node.format(
diff --git a/airflow/www/views.py b/airflow/www/views.py
index f5f3b78..5441434 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -653,6 +653,7 @@ class

[GitHub] [airflow] ryanahamilton closed issue #15384: Pagination doesn't work with tags filter

2021-04-16 Thread GitBox


ryanahamilton closed issue #15384:
URL: https://github.com/apache/airflow/issues/15384


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ryanahamilton merged pull request #15411: Fix: Persist tags URL params in pagination

2021-04-16 Thread GitBox


ryanahamilton merged pull request #15411:
URL: https://github.com/apache/airflow/pull/15411


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] pelaprat commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


pelaprat commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821575562


   We are experiencing this bug in our Airlfow implementation as well. This is 
my first time participating in reporting a bug on the Apache Airflow project. 
Out of curiosity, who is responsible on the Apache Airflow project for 
prioritizing these issues?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-4922) If a task crashes, host name is not committed to the database so logs aren't able to be seen in the UI

2021-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324086#comment-17324086
 ] 

ASF GitHub Bot commented on AIRFLOW-4922:
-

pelaprat commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-821573997


   Why is this PR closed? This bug still exists in Airflow 2.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If a task crashes, host name is not committed to the database so logs aren't 
> able to be seen in the UI
> --
>
> Key: AIRFLOW-4922
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4922
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.3
>Reporter: Andrew Harmon
>Assignee: wanghong-T
>Priority: Major
>
> Sometimes when a task fails, the log show the following
> {code}
> *** Log file does not exist: 
> /usr/local/airflow/logs/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Fetching from: 
> http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log*** 
> Failed to fetch log file from worker. Invalid URL 
> 'http://:8793/log/my_dag/my_task/2019-07-07T09:00:00+00:00/1.log': No host 
> supplied
> {code}
> I believe this is due to the fact that the row is not committed to the 
> database until after the task finishes. 
> https://github.com/apache/airflow/blob/a1f9d9a03faecbb4ab52def2735e374b2e88b2b9/airflow/models/taskinstance.py#L857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] pelaprat commented on pull request #6722: [AIRFLOW-4922]Fix task get log by Web UI

2021-04-16 Thread GitBox


pelaprat commented on pull request #6722:
URL: https://github.com/apache/airflow/pull/6722#issuecomment-821573997


   Why is this PR closed? This bug still exists in Airflow 2.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] quoc-t-le removed a comment on issue #15407: ShortCircuitOperator to short circuit downstream of TaskGroup only

2021-04-16 Thread GitBox


quoc-t-le removed a comment on issue #15407:
URL: https://github.com/apache/airflow/issues/15407#issuecomment-821535724


   yeah at the moment I cant continue to use the subdag because of 
prev_execution_date not correctly passed down to subdag or something with 
Airflow 2: [https://github.com/apache/airflow/issues/15396](url)
   
   I tried BranchPythonOperator but doesnt work as expected.  In theory, this 
should branch to the skip and execute t2...
   
   `from airflow import DAG
   from datetime import datetime
   from airflow.operators.dummy_operator import DummyOperator
   from airflow.operators.python_operator import ShortCircuitOperator, 
PythonOperator, BranchPythonOperator
   from airflow.utils.task_group import TaskGroup
   
   default_args = {
   'owner': 'airflow',
   'retries': 3,
   'depends_on_past': False,
   }
   
   def branch_op (*arg, **kwargs):
   category = kwargs.get('category')
   if (category=='t1'):
   return 't1.skip'
   
   if (category=='t2'):
   return 't2.skip'
   
   with DAG ("short-circuit",
 catchup=True,
 default_args=default_args,
 schedule_interval='@daily',
 description='Aggregates and pulls down data for API endpoints that 
use analytics',
 start_date=datetime.strptime('04/14/2021', '%m/%d/%Y'),
 max_active_runs=1
   ) as dag:
   t0 = DummyOperator(task_id='start')
   with TaskGroup('t1') as t1:
   s1 = BranchPythonOperator(
   task_id='short_circuit',
   python_callable=branch_op,
   provide_context=True,
   op_kwargs={"category": "t1"}
   )
   s2 = DummyOperator(task_id='t1s2')
   s3 = DummyOperator(task_id='t1s3')
   s4 = DummyOperator(task_id='t1s4')
   s5 = DummyOperator(task_id='t1s5')
   s6 = DummyOperator(task_id='t1s6')
   s7 = DummyOperator(task_id='t1s7')
   s8 = DummyOperator(task_id='t1s8')
   s9 = DummyOperator(task_id='t1s9')
   s10 = DummyOperator(task_id='skip')
   s1 >> s2 >> s3 >> s4
   s4 >> s5
   s5 >> s6 >> s7
   s7 >> s8 >> s9 >> s10
   
   
   with TaskGroup('t2') as t2:
   s1 = BranchPythonOperator(
   task_id='short_circuit',
   python_callable=branch_op,
   provide_context=True,
   op_kwargs={"category": "t2"}
   )
   s2 = DummyOperator(task_id='t1s2')
   s3 = DummyOperator(task_id='t1s3')
   s4 = DummyOperator(task_id='t1s4')
   s5 = DummyOperator(task_id='t1s5')
   s6 = DummyOperator(task_id='t1s6')
   s7 = DummyOperator(task_id='t1s7')
   s8 = DummyOperator(task_id='t1s8')
   s9 = DummyOperator(task_id='t1s9')
   s10 = DummyOperator(task_id='skip')
   s1 >> s2 >> s3 >> s4
   s4 >> s5
   s5 >> s6 >> s7
   s7 >> s8 >> s9 >> s10
   t0 >> t1 >> t2`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] commented on pull request #13199: Create dag dependencies view

2021-04-16 Thread GitBox


github-actions[bot] commented on pull request #13199:
URL: https://github.com/apache/airflow/pull/13199#issuecomment-821555359


   [The Workflow run](https://github.com/apache/airflow/actions/runs/756857228) 
is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static 
checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm 
tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] commented on pull request #15411: Fix: Persist tags URL params in pagination

2021-04-16 Thread GitBox


github-actions[bot] commented on pull request #15411:
URL: https://github.com/apache/airflow/pull/15411#issuecomment-821554058


   The PR is likely OK to be merged with just subset of tests for default 
Python and Database versions without running the full matrix of tests, because 
it does not modify the core of Airflow. If the committers decide that the full 
tests matrix is needed, they will add the label 'full tests needed'. Then you 
should rebase to the latest master or amend the last commit of the PR, and push 
it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] zhzhang opened a new pull request #15412: PostgresHook: deepcopy connection to avoid mutating connection obj

2021-04-16 Thread GitBox


zhzhang opened a new pull request #15412:
URL: https://github.com/apache/airflow/pull/15412


   
   PostgresHook: deepcopy connection object to avoid unexpected mutation
   ---
   
   For AWS IAM based connections, the connection object is mutated by setting 
the `login`, `password` and `port` properties after the call to 
`self.get_iam_token(...)`. The connection object's `login` field is set to 
`"IAM:original-login"` because the `login` value returned has `IAM:` prefixed.
   
   As a result, any subsequent calls to `get_conn()` will fail with the AWS 
`client.get_cluster_credentials(...)` call complaining that `:` cannot be part 
of the `DbUser` argument.
   
   In general, the values within any connection objects used by this hook 
should not be subject to mutation by `get_conn()` calls, and the addition of 
`deepcopy` is meant to enforce this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] rcramblit commented on issue #8421: Hide sensitive data in UI

2021-04-16 Thread GitBox


rcramblit commented on issue #8421:
URL: https://github.com/apache/airflow/issues/8421#issuecomment-821550334


   I have a similar need that might not be covered by the solutions posed in 
this thread:
   I am using the DockerOperator in Airflow 1.10.15 and want to pass some 
secrets to the docker container via DockerOperator's `environment` argument, 
but this argument is rendered in the AirflowUI in the Task Instance Details 
view. Is there any way to control what arguments are rendered there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ryanahamilton opened a new pull request #15411: Fix: Persist tags URL params in pagination

2021-04-16 Thread GitBox


ryanahamilton opened a new pull request #15411:
URL: https://github.com/apache/airflow/pull/15411


   Closes #15384
   
   The `tags=tagname` URL params were not included in the pagination logic, so 
they were being lost when a user navigated beyond the first page.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] quoc-t-le commented on issue #15407: ShortCircuitOperator to short circuit downstream of TaskGroup only

2021-04-16 Thread GitBox


quoc-t-le commented on issue #15407:
URL: https://github.com/apache/airflow/issues/15407#issuecomment-821535724


   yeah at the moment I cant continue to use the subdag because of 
prev_execution_date not correctly passed down to subdag or something with 
Airflow 2: [https://github.com/apache/airflow/issues/15396](url)
   
   I tried BranchPythonOperator but doesnt work as expected.  In theory, this 
should branch to the skip and execute t2...
   
   `from airflow import DAG
   from datetime import datetime
   from airflow.operators.dummy_operator import DummyOperator
   from airflow.operators.python_operator import ShortCircuitOperator, 
PythonOperator, BranchPythonOperator
   from airflow.utils.task_group import TaskGroup
   
   default_args = {
   'owner': 'airflow',
   'retries': 3,
   'depends_on_past': False,
   }
   
   def branch_op (*arg, **kwargs):
   category = kwargs.get('category')
   if (category=='t1'):
   return 't1.skip'
   
   if (category=='t2'):
   return 't2.skip'
   
   with DAG ("short-circuit",
 catchup=True,
 default_args=default_args,
 schedule_interval='@daily',
 description='Aggregates and pulls down data for API endpoints that 
use analytics',
 start_date=datetime.strptime('04/14/2021', '%m/%d/%Y'),
 max_active_runs=1
   ) as dag:
   t0 = DummyOperator(task_id='start')
   with TaskGroup('t1') as t1:
   s1 = BranchPythonOperator(
   task_id='short_circuit',
   python_callable=branch_op,
   provide_context=True,
   op_kwargs={"category": "t1"}
   )
   s2 = DummyOperator(task_id='t1s2')
   s3 = DummyOperator(task_id='t1s3')
   s4 = DummyOperator(task_id='t1s4')
   s5 = DummyOperator(task_id='t1s5')
   s6 = DummyOperator(task_id='t1s6')
   s7 = DummyOperator(task_id='t1s7')
   s8 = DummyOperator(task_id='t1s8')
   s9 = DummyOperator(task_id='t1s9')
   s10 = DummyOperator(task_id='skip')
   s1 >> s2 >> s3 >> s4
   s4 >> s5
   s5 >> s6 >> s7
   s7 >> s8 >> s9 >> s10
   
   
   with TaskGroup('t2') as t2:
   s1 = BranchPythonOperator(
   task_id='short_circuit',
   python_callable=branch_op,
   provide_context=True,
   op_kwargs={"category": "t2"}
   )
   s2 = DummyOperator(task_id='t1s2')
   s3 = DummyOperator(task_id='t1s3')
   s4 = DummyOperator(task_id='t1s4')
   s5 = DummyOperator(task_id='t1s5')
   s6 = DummyOperator(task_id='t1s6')
   s7 = DummyOperator(task_id='t1s7')
   s8 = DummyOperator(task_id='t1s8')
   s9 = DummyOperator(task_id='t1s9')
   s10 = DummyOperator(task_id='skip')
   s1 >> s2 >> s3 >> s4
   s4 >> s5
   s5 >> s6 >> s7
   s7 >> s8 >> s9 >> s10
   t0 >> t1 >> t2`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kentdanas opened a new pull request #15410: Add documentation for Databricks connection

2021-04-16 Thread GitBox


kentdanas opened a new pull request #15410:
URL: https://github.com/apache/airflow/pull/15410


   Adding documentation for how to set up Databricks connection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] blcksrx opened a new pull request #15409: Support regex pattern in SFTPHOOK

2021-04-16 Thread GitBox


blcksrx opened a new pull request #15409:
URL: https://github.com/apache/airflow/pull/15409


   
   In many cases, users are interested in some cases of files or types, not the 
whole thing. for example, users want to retrieve only CSV files in the `/tmp` 
directory and etc. Unfortunately, `Paramiko` does not support wi
   
   
   related: #15397
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow-site] branch asf-site updated: Deploying to asf-site from @ 0062afad8b6c7efe2e4c517f60b157b655779b92 🚀

2021-04-16 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new fd1bcde  Deploying to asf-site from  @ 
0062afad8b6c7efe2e4c517f60b157b655779b92 🚀
fd1bcde is described below

commit fd1bcde2ff0f2f57e84e611394b0d68230ddd471
Author: kaxil 
AuthorDate: Fri Apr 16 18:46:14 2021 +

Deploying to asf-site from  @ 0062afad8b6c7efe2e4c517f60b157b655779b92 🚀
---
 blog/airflow-1.10.10/index.html|  4 +-
 blog/airflow-1.10.12/index.html|  4 +-
 blog/airflow-1.10.8-1.10.9/index.html  |  4 +-
 blog/airflow-survey-2020/index.html|  4 +-
 blog/airflow-survey/index.html |  4 +-
 blog/airflow-two-point-oh-is-here/index.html   |  4 +-
 blog/airflow_summit_2021/index.html|  4 +-
 blog/announcing-new-website/index.html |  4 +-
 blog/apache-airflow-for-newcomers/index.html   |  4 +-
 .../index.html |  4 +-
 .../index.html |  4 +-
 .../index.html |  4 +-
 .../index.html |  4 +-
 .../index.html |  4 +-
 .../index.html |  4 +-
 ecosystem/index.html   |  8 +-
 index.html | 32 
 search/index.html  |  4 +-
 sitemap.xml| 86 +++---
 use-cases/adobe/index.html |  4 +-
 use-cases/big-fish-games/index.html|  4 +-
 use-cases/dish/index.html  |  4 +-
 use-cases/experity/index.html  |  4 +-
 use-cases/onefootball/index.html   |  4 +-
 use-cases/plarium-krasnodar/index.html |  4 +-
 use-cases/sift/index.html  |  4 +-
 26 files changed, 110 insertions(+), 108 deletions(-)

diff --git a/blog/airflow-1.10.10/index.html b/blog/airflow-1.10.10/index.html
index 0105eac..dd2a867 100644
--- a/blog/airflow-1.10.10/index.html
+++ b/blog/airflow-1.10.10/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/airflow-1.10.12/index.html b/blog/airflow-1.10.12/index.html
index 3b9662d..d17a2e1 100644
--- a/blog/airflow-1.10.12/index.html
+++ b/blog/airflow-1.10.12/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/airflow-1.10.8-1.10.9/index.html 
b/blog/airflow-1.10.8-1.10.9/index.html
index 69f623b..b1cad3d 100644
--- a/blog/airflow-1.10.8-1.10.9/index.html
+++ b/blog/airflow-1.10.8-1.10.9/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/airflow-survey-2020/index.html 
b/blog/airflow-survey-2020/index.html
index 890cd30..44e4140 100644
--- a/blog/airflow-survey-2020/index.html
+++ b/blog/airflow-survey-2020/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/airflow-survey/index.html b/blog/airflow-survey/index.html
index 5cfa463..ddb68da 100644
--- a/blog/airflow-survey/index.html
+++ b/blog/airflow-survey/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/airflow-two-point-oh-is-here/index.html 
b/blog/airflow-two-point-oh-is-here/index.html
index d9a227f..02b4ae5 100644
--- a/blog/airflow-two-point-oh-is-here/index.html
+++ b/blog/airflow-two-point-oh-is-here/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/airflow_summit_2021/index.html 
b/blog/airflow_summit_2021/index.html
index 9b0131c..4566076 100644
--- a/blog/airflow_summit_2021/index.html
+++ b/blog/airflow_summit_2021/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/announcing-new-website/index.html 
b/blog/announcing-new-website/index.html
index 80630df..872e31d 100644
--- a/blog/announcing-new-website/index.html
+++ b/blog/announcing-new-website/index.html
@@ -36,13 +36,13 @@
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git a/blog/apache-airflow-for-newcomers/index.html 
b/blog/apache-airflow-for-newcomers/index.html
index a2e19c5..91166e9 100644
--- a/blog/apache-airflow-for-newcomers/index.html
+++ b/blog/apache-airflow-for-newcomers/index.html
@@ -37,14 +37,14 @@ Authoring Workflow in Apache Airflow. Airflow makes it easy 
to author workflows
 
 
 
-
+
 
 
 
 
 
-
+
 
 
 
diff --git 
a/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
 
b/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
index c631b75..fbf3402 100644
--- 
a/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
+++ 
b/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
@@ -36,13 +36

[GitHub] [airflow-site] kaxil merged pull request #404: Adding ref to Astronomer Registry + alphabetizing tools

2021-04-16 Thread GitBox


kaxil merged pull request #404:
URL: https://github.com/apache/airflow-site/pull/404


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow-site] branch master updated: Adding ref to Astronomer Registry + alphabetizing tools (#404)

2021-04-16 Thread kaxilnaik
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


The following commit(s) were added to refs/heads/master by this push:
 new 0062afa  Adding ref to Astronomer Registry + alphabetizing tools (#404)
0062afa is described below

commit 0062afad8b6c7efe2e4c517f60b157b655779b92
Author: josh-fell <48934154+josh-f...@users.noreply.github.com>
AuthorDate: Fri Apr 16 14:40:52 2021 -0400

Adding ref to Astronomer Registry + alphabetizing tools (#404)
---
 landing-pages/site/content/en/ecosystem/_index.md | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/landing-pages/site/content/en/ecosystem/_index.md 
b/landing-pages/site/content/en/ecosystem/_index.md
index 6a5ff3c..65a6c8e 100644
--- a/landing-pages/site/content/en/ecosystem/_index.md
+++ b/landing-pages/site/content/en/ecosystem/_index.md
@@ -31,7 +31,7 @@ If you would you like to be included on this page, please 
reach out to the [Apac
 
  
 
-## Airflow-As-A-Service
+## Airflow as a Service
 
 [Astronomer](https://www.astronomer.io/) - Managed Apache Airflow in 
Astronomer Cloud, or self-hosted within your environment
 
@@ -65,6 +65,8 @@ If you would you like to be included on this page, please 
reach out to the [Apac
 
 [Apache-Liminal-Incubating](https://github.com/apache/incubator-liminal) -  
Liminal provides a domain-specific-language (DSL) to build ML/AI workflows on 
top of Apache Airflow. Its goal is to operationalise the machine learning 
process, allowing data scientists to quickly transition from a successful 
experiment to an automated pipeline of model training, validation, deployment 
and inference in production.
 
+[Astronomer Registry](https://registry.astronomer.io/) - The discovery and 
distribution hub for Apache Airflow integrations created to aggregate and 
curate the best bits of the ecosystem.
+
 [Chartis](https://github.com/trejas/chartis) - Python package to convert 
Common Workflow Language (CWL) into Airflow DAG.
 
 [CWL-Airflow](https://github.com/Barski-lab/cwl-airflow) - Python package to 
extend Apache-Airflow 1.10.11 functionality with CWL v1.2 support.
@@ -87,8 +89,8 @@ If you would you like to be included on this page, please 
reach out to the [Apac
 
 [Pylint-Airflow](https://github.com/BasPH/pylint-airflow) - A Pylint plugin 
for static code analysis on Airflow code.
 
-[whirl](https://github.com/godatadriven/whirl) - Fast iterative local 
development and testing of Apache Airflow workflows.
-
 [simple-dag-editor](https://github.com/ohadmata/simple-dag-editor) - Zero 
configuration Airflow plugin that let you manage your DAG files.
 
 [Viewflow](https://github.com/datacamp/viewflow) - An Airflow-based framework 
that allows data scientists to create data models without writing Airflow code.
+
+[whirl](https://github.com/godatadriven/whirl) - Fast iterative local 
development and testing of Apache Airflow workflows.


[GitHub] [airflow-site] josh-fell opened a new pull request #404: Adding ref to Astronomer Registry + alphabetizing tools

2021-04-16 Thread GitBox


josh-fell opened a new pull request #404:
URL: https://github.com/apache/airflow-site/pull/404


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal commented on issue #8770: Airflow 1.10.7 logs from S3 won't load, just hanging

2021-04-16 Thread GitBox


eladkal commented on issue #8770:
URL: https://github.com/apache/airflow/issues/8770#issuecomment-821427992


   @aarrtteemmuuss Since there are no reproduce steps to check the issue and 
there has been significant changes since you reported the issue I'm closing 
this.
   
   If you are still experiencing problems after verifying against Airflow 2 + 
Amazon provider please let us know.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal closed issue #8770: Airflow 1.10.7 logs from S3 won't load, just hanging

2021-04-16 Thread GitBox


eladkal closed issue #8770:
URL: https://github.com/apache/airflow/issues/8770


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] xiskoct commented on issue #14205: Scheduler "deadlocks" itself when max_active_runs_per_dag is reached by up_for_retry tasks

2021-04-16 Thread GitBox


xiskoct commented on issue #14205:
URL: https://github.com/apache/airflow/issues/14205#issuecomment-821412328


   the same thing happens to me, huge problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sunkickr commented on a change in pull request #15408: Add Connection Documentation to more Providers

2021-04-16 Thread GitBox


sunkickr commented on a change in pull request #15408:
URL: https://github.com/apache/airflow/pull/15408#discussion_r615042397



##
File path: docs/apache-airflow-providers-snowflake/connections/snowflake.rst
##
@@ -0,0 +1,76 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+.. _howto/connection:snowflake:
+
+Snowflake Connection
+
+
+The Snowflake connection type enables integrations with Snowflake.
+
+Authenticating to Snowflake
+---
+
+Authenticate to Snowflake using the `Snowflake python connector default 
authentication
+`_.
+
+Default Connection IDs
+--
+
+Hooks, operators, and sensors related to Snowflake use ``snowflake_default`` 
by default.
+
+Configuring the Connection
+--
+
+Login
+Specify the snowflake username.
+
+Password
+Specify the snowflake password.
+
+Host (optional)
+Specify the snowflake hostname.
+
+Schema (optional)
+Specify the snowflake schema to be used.
+
+Extra (optional)
+Specify the extra parameters (as json dictionary) that can be used in the 
snowflake connection.
+The following parameters are all optional:
+
+* ``account``: Snowflake account name.
+* ``database``: Snowflake database name.
+* ``region``: Warehouse region.
+* ``warehouse``: Snowflake warehouse name.
+* ``role``: Snowflake role.
+* ``authenticator``: To connect using OAuth set this parameter ``oath``
+* ``private_key_file``: Specify the path to the private key file.
+* ``session_parameters``: Specify `session level parameters
+  
`_

Review comment:
   yes, I'm not sure how to handle this since users that are using this 
documentation to build a URI would want to know that all these fields need to 
be specified as extras. I am assuming users using the Airflow UI would see that 
some of these have their own field and understand what's going on. Maybe it 
would be best to add a note about this somewhere in the documentation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal commented on issue #15244: Crash when looking at the individually triggered tasks

2021-04-16 Thread GitBox


eladkal commented on issue #15244:
URL: https://github.com/apache/airflow/issues/15244#issuecomment-821394318


   >Browse->DAG Runs-> Select a task that was triggered individually, not as 
one DAG.
   
   Can you please explain further? An actual example with DAG code and maybe a 
gif showing the issue might help here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - airflow 1.10.14
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value schedulerNumRuns is set  -1),  it decided 
to run 3 out of 4 tasks, and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
**stuck at the 'scheduled' state** after clear the task.
   - when checking log on scheduler, it has error like this
   `[2021-04-16 13:06:36,392] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 3497') Celery 
Task ID: ('_YYY_test', 'Task_blahblahblah', datetime.datetime(2021, 4, 
15, 3, 0, tzinfo=), 1)`
   
   - i reported here: https://github.com/helm/charts/issues/19399 but found the 
issue is already closed.
   
   **I tried to experiment the "min_file_process_interval to be 30", which did 
not help as i expected.**
   - Uploaded the dag with new name/id. enabled, cleared the dag (otherwise the 
1st task just stuck at the 'queued' state)
   and 1st task is at the 'scheduled' state and stuck there.
   - check scheduler log:
   - `[2021-04-16 15:58:51,991] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 1851')
   Celery Task ID: ('XX_min_test_3', 'Load__to_', 
datetime.datetime(2021, 4, 15, 3, 0, tzinfo=), 1)
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 117, in fetch_celery_task_state
   res = (celery_task[0], celery_task[1].state)`
   
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment.  during debug of this stuck situation, the table 'dag', 
'task_*', 'celery_*' had been truncated.
   ```
   - celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule
   - connections
   ```
   
   the dag log itself is empty, since the task was not executed. 
   the worker has no errors. i will attach the log anyhow.
   
[log-workers.txt](https://github.com/apache/airflow/files/6327151/log-workers.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - airflow 1.10.14
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value schedulerNumRuns is set  -1),  it decided 
to run 3 out of 4 tasks, and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
**stuck at the 'scheduled' state** after clear the task.
   - when checking log on scheduler, it has error like this
   `[2021-04-16 13:06:36,392] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 3497') Celery 
Task ID: ('_YYY_test', 'Task_blahblahblah', datetime.datetime(2021, 4, 
15, 3, 0, tzinfo=), 1)`
   
   - i reported here: https://github.com/helm/charts/issues/19399 but found the 
issue is already closed.
   
   **I tried to experiment the , which did not help as i expected.**
   - Uploaded the dag with new name/id. enabled, cleared the dag (otherwise the 
1st task just stuck at the 'queued' state)
   and 1st task is at the 'scheduled' state and stuck there.
   - check scheduler log:
   - `[2021-04-16 15:58:51,991] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 1851')
   Celery Task ID: ('XX_min_test_3', 'Load__to_', 
datetime.datetime(2021, 4, 15, 3, 0, tzinfo=), 1)
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 117, in fetch_celery_task_state
   res = (celery_task[0], celery_task[1].state)`
   
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment.  during debug of this stuck situation, the table 'dag', 
'task_*', 'celery_*' had been truncated.
   ```
   - celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule
   - connections
   ```
   
   the dag log itself is empty, since the task was not executed. 
   the worker has no errors. i will attach the log anyhow.
   
[log-workers.txt](https://github.com/apache/airflow/files/6327151/log-workers.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder 
(which is saved in the file share -PVC, and can be viewed via azure storage 
explorer). 
   
   updated my comment earlier


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder 
(which is saved in the file share -PVC, and can be viewed via azure storage 
explorer). 
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment.  during debug of this stuck situation, the table 'dag', 
'task_*', 'celery_*' had been truncated.
   ```
   - celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule
   - connections
   ```
   
   How would like me to fetch the log? Just the dag log run? 
   update: the one i can view via storage explorer - it is empty, since the 
task was not executed. i will check the worker itself. and update this comment.
   
   there are no errors. i will attach the log anyhow.
   
[log-workers.txt](https://github.com/apache/airflow/files/6327151/log-workers.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder 
(which is saved in the file share -PVC, and can be viewed via azure storage 
explorer). 
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment.  during debug of this stuck situation, the table 'dag', 
'task_*', 'celery_*' had been truncated.
   ```
   - celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule
   - connections
   ```
   
   How would like me to fetch the log? Just the dag log run? 
   update: the one i can view via storage explorer - it is empty, since the 
task was not executed. i will check the worker itself. and update this comment.
   
   there are no errors. i will attach the log anyhow.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder 
(which is saved in the file share -PVC, and can be viewed via azure storage 
explorer). 
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment.  during debug of this stuck situation, the table 'dag', 
'task_*', 'celery_*' had been truncated.
   ```
   - celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule
   - connections
   ```
   
   How would like me to fetch the log? Just the dag log run? 
   update: the one i can view via storage explorer - it is empty, since the 
task was not executed. i will check the worker itself. and update this comment.
   
   there are no errors
   ```
   $ kubectl -n airflow get pods
   NAME  READY   STATUSRESTARTS 
  AGE
   airflow-scheduler-55549d985f-dbw7b1/1 Running   0
  134m
   airflow-web-6c8467fd74-9zkj5  1/1 Running   0
  134m
   airflow-worker-0  1/1 Running   0
  115m
   airflow-worker-1  1/1 Running   0
  125m
   nginx-ingress-nginx-controller-6d5794678d-8zdsl   1/1 Running   0
  5h16m
   nginx-ingress-nginx-controller-6d5794678d-fzhwb   1/1 Running   0
  5h15m
   telegraf-54cd7f8578-fgdrn 1/1 Running   0
  5h16m
   DKCPHMAC137:instructions mishi$ kubectl -n airflow logs airflow-worker-0
   *** installing global extra pip packages...
   Collecting flask_oauthlib==0.9.6
 Downloading Flask_OAuthlib-0.9.6-py3-none-any.whl (40 kB)
   Collecting cachelib
 Downloading cachelib-0.1.1-py3-none-any.whl (13 kB)
   Collecting requests-oauthlib<1.2.0,>=0.6.2
 Downloading requests_oauthlib-1.1.0-py2.py3-none-any.whl (21 kB)
   Requirement already satisfied: Flask in 
/home/airflow/.local/lib/python3.7/site-packages (from flask_oauthlib==0.9.6) 
(1.1.2)
   Collecting oauthlib!=2.0.3,!=2.0.4,!=2.0.5,<3.0.0,>=1.1.2
 Downloading oauthlib-2.1.0-py2.py3-none-any.whl (121 kB)
   Requirement already satisfied: requests>=2.0.0 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
requests-oauthlib<1.2.0,>=0.6.2->flask_oauthlib==0.9.6) (2.23.0)
   Requirement already satisfied: click>=5.1 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
Flask->flask_oauthlib==0.9.6) (6.7)
   Requirement already satisfied: Werkzeug>=0.15 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
Flask->flask_oauthlib==0.9.6) (0.16.1)
   Requirement already satisfied: itsdangerous>=0.24 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
Flask->flask_oauthlib==0.9.6) (1.1.0)
   Requirement already satisfied: Jinja2>=2.10.1 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
Flask->flask_oauthlib==0.9.6) (2.11.2)
   Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
requests>=2.0.0->requests-oauthlib<1.2.0,>=0.6.2->flask_oauthlib==0.9.6) 
(1.25.11)
   Requirement already satisfied: certifi>=2017.4.17 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
requests>=2.0.0->requests-oauthlib<1.2.0,>=0.6.2->flask_oauthlib==0.9.6) 
(2020.11.8)
   Requirement already satisfied: idna<3,>=2.5 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
requests>=2.0.0->requests-oauthlib<1.2.0,>=0.6.2->flask_oauthlib==0.9.6) (2.8)
   Requirement already satisfied: chardet<4,>=3.0.2 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
requests>=2.0.0->requests-oauthlib<1.2.0,>=0.6.2->flask_oauthlib==0.9.6) (3.0.4)
   Requirement already satisfied: MarkupSafe>=0.23 in 
/home/airflow/.local/lib/python3.7/site-packages (from 
Jinja2>=2.10.1->Flask->flask_oauthlib==0.9.6) (1.1.1)
   Installing collected packages: cachelib, oauthlib, requests-oauthlib, 
flask-oauthlib
 Attempting uninstall: oauthlib
   Found existing installation: oauthlib 3.1.0
   Uninstalling oauthlib-3.1.0:
 Successfully uninstalled oauthlib-3.1.0
 Attempting uninstall: requests-oauthlib
   Found existing installation: requests-oauthlib 1.3.0
   Uninstalling requests-oauthlib-1.3.0:
 Successfully uninstalled requests-oauthlib-1.3.0
   Successfully installed cachelib-0.1.1 flask-oauthlib-0.9.6 oauthlib-2.1.0 
requests-oauthlib-1.1.0
   WARNING: You are using pip version 20.2.4; however, version 21.0.1 is 
available.
   You should consider upgrading via the '/usr/local/bin/python -m pip install 
--upgrade pip' command.
   *** running worker...
   [2021-04-16 16:03:50,914] {settings.py:233} DEBUG - Setting up DB connection 
pool (PID 7)
   [2021-04-16 16:03:50,914] {settings.py:300} DEBUG - 
settings.prepare_engine_args(): Using pool

[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder 
(which is saved in the file share -PVC, and can be viewed via azure storage 
explorer). 
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment.  during debug of this stuck situation, the table 'dag', 
'task_*', 'celery_*' had been truncated.
   ```
   - celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule
   - connections
   ```
   
   How would like me to fetch the log? Just the dag log run? 
   update: the one i can view via storage explorer - it is empty, since the 
task was not executed. i will check the worker itself. and update this comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sosso edited a comment on issue #14205: Scheduler "deadlocks" itself when max_active_runs_per_dag is reached by up_for_retry tasks

2021-04-16 Thread GitBox


sosso edited a comment on issue #14205:
URL: https://github.com/apache/airflow/issues/14205#issuecomment-821326846


   +1 for us on this issue as well, I think?  Very strangely, we see the most 
recent run for a DAG have its *run* be set to 'running', but the only task in 
the DAG be a clear success:
   
   
![image](https://user-images.githubusercontent.com/619968/115061689-57345180-9e9e-11eb-88a4-41de2abf94d2.png)
   
   This is a catchup=False DAG, where the only task runs in a pool, and there 
is *nothing* in the Scheduler log for this DAG for two hours (the DAG runs is 
supposed to run every 5 minutes) about why it can't schedule this DAG.  No "max 
active runs reached", no "no slots available in pool", nothing.  It's like the 
scheduler forgot this DAG existed until we rebooted it.
   
   *edit* This has happened again, here are the relevant log lines 
(cherry-picked via grep):
   
   ```
   [2021-04-16 17:44:01,418] {{base_executor.py:82}} INFO - Adding to queue: 
['airflow', 'tasks', 'run', 'mls_ivbor_smart_incremental_v1', 
'streaming-importer', '2021-04-16T17:30:00+00:00', '--local', '--pool', 
'ivbor', '--subdir', '/efs/airflow/dags/mls_incrementals_i.py']
   [2021-04-16 17:44:04,638] {{scheduler_job.py:1206}} INFO - Executor reports 
execution of mls_ivbor_smart_incremental_v1.streaming-importer 
execution_date=2021-04-16 17:30:00+00:00 exited with status queued for 
try_number 1
   [2021-04-16 17:44:04,661] {{scheduler_job.py:1226}} INFO - Setting 
external_id for  to 2c9ee22a-ad2b-4255-846a-85896fa517ed
   [2021-04-16 17:44:43,326] {{scheduler_job.py:1206}} INFO - Executor reports 
execution of mls_ivbor_smart_incremental_v1.streaming-importer 
execution_date=2021-04-16 17:30:00+00:00 exited with status success for 
try_number 1
   [2021-04-16 18:00:18,075] {{dagrun.py:445}} INFO - Marking run  successful
   ```
   
   So the task succeeded at 17:44, but the dagrun wasn't set to success until 
16 minutes later?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder 
(which is saved in the file share -PVC, and can be viewed via azure storage 
explorer). 
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment. 
   `- celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule`
   
   How would like me to fetch the log? Just the dag log run? 
   update: the one i can view via storage explorer - it is empty, since the 
task was not executed. i will check the worker itself. and update this comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder, 
which is saved in the file share (PVC), can be viewed via azure storage 
explorer. 
   The environment is set up new, and migrated a few tables from old 
environment. How would like me to fetch the log? Just the dag log run? 
   
   update: the one i can view via storage explorer - it is empty, since the 
task was not executed. i will check the worker itself. and update this comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder, 
which is saved in the file share (PVC), can be viewed via azure storage 
explorer. 
   The environment is set up new, and migrated a few tables from old 
environment. How would like me to fetch the log? Just the dag log run? 
   
   update: the one i can view via storage log - it is empty, since the task was 
not executed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder, 
which is saved in the file share (PVC), can be viewed via azure storage 
explorer. 
   The environment is set up new, and migrated a few tables from old 
environment. How would like me to fetch the log? Just the dag log run? 
   
   update: the one i can view via storage explorer - it is empty, since the 
task was not executed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821356514


   @SalmonTimo i have access to the workers, as well as the dag logs folder, 
which is saved in the file share (PVC), can be viewed via azure storage 
explorer. 
   The environment is set up new, and migrated a few tables from old 
environment. How would like me to fetch the log? Just the dag log run? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal commented on a change in pull request #15408: Add Connection Documentation to more Providers

2021-04-16 Thread GitBox


eladkal commented on a change in pull request #15408:
URL: https://github.com/apache/airflow/pull/15408#discussion_r615024146



##
File path: docs/apache-airflow-providers-snowflake/connections/snowflake.rst
##
@@ -0,0 +1,76 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+.. _howto/connection:snowflake:
+
+Snowflake Connection
+
+
+The Snowflake connection type enables integrations with Snowflake.
+
+Authenticating to Snowflake
+---
+
+Authenticate to Snowflake using the `Snowflake python connector default 
authentication
+`_.
+
+Default Connection IDs
+--
+
+Hooks, operators, and sensors related to Snowflake use ``snowflake_default`` 
by default.
+
+Configuring the Connection
+--
+
+Login
+Specify the snowflake username.
+
+Password
+Specify the snowflake password.
+
+Host (optional)
+Specify the snowflake hostname.
+
+Schema (optional)
+Specify the snowflake schema to be used.
+
+Extra (optional)
+Specify the extra parameters (as json dictionary) that can be used in the 
snowflake connection.
+The following parameters are all optional:
+
+* ``account``: Snowflake account name.
+* ``database``: Snowflake database name.
+* ``region``: Warehouse region.
+* ``warehouse``: Snowflake warehouse name.
+* ``role``: Snowflake role.
+* ``authenticator``: To connect using OAuth set this parameter ``oath``
+* ``private_key_file``: Specify the path to the private key file.
+* ``session_parameters``: Specify `session level parameters
+  
`_

Review comment:
   Is this section correct? This section suggest that you need to define 
these values in the Extra field while some of these values have dedicated field 
https://github.com/apache/airflow/pull/14724




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal commented on a change in pull request #15408: Add Connection Documentation to more Providers

2021-04-16 Thread GitBox


eladkal commented on a change in pull request #15408:
URL: https://github.com/apache/airflow/pull/15408#discussion_r615024146



##
File path: docs/apache-airflow-providers-snowflake/connections/snowflake.rst
##
@@ -0,0 +1,76 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+.. _howto/connection:snowflake:
+
+Snowflake Connection
+
+
+The Snowflake connection type enables integrations with Snowflake.
+
+Authenticating to Snowflake
+---
+
+Authenticate to Snowflake using the `Snowflake python connector default 
authentication
+`_.
+
+Default Connection IDs
+--
+
+Hooks, operators, and sensors related to Snowflake use ``snowflake_default`` 
by default.
+
+Configuring the Connection
+--
+
+Login
+Specify the snowflake username.
+
+Password
+Specify the snowflake password.
+
+Host (optional)
+Specify the snowflake hostname.
+
+Schema (optional)
+Specify the snowflake schema to be used.
+
+Extra (optional)
+Specify the extra parameters (as json dictionary) that can be used in the 
snowflake connection.
+The following parameters are all optional:
+
+* ``account``: Snowflake account name.
+* ``database``: Snowflake database name.
+* ``region``: Warehouse region.
+* ``warehouse``: Snowflake warehouse name.
+* ``role``: Snowflake role.
+* ``authenticator``: To connect using OAuth set this parameter ``oath``
+* ``private_key_file``: Specify the path to the private key file.
+* ``session_parameters``: Specify `session level parameters
+  
`_

Review comment:
   Is this section correct? https://github.com/apache/airflow/pull/14724




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal edited a comment on issue #15407: ShortCircuitOperator to short circuit downstream of TaskGroup only

2021-04-16 Thread GitBox


eladkal edited a comment on issue #15407:
URL: https://github.com/apache/airflow/issues/15407#issuecomment-821334673


   `ShortCircuitOperator` cascades through everything by design 
https://github.com/apache/airflow/issues/7858 
   Since TaskGroup is more of a UI feature this is expected.
   However given your use case I can see how this functionality is missing.
   
   I think this one requires some discussion to define what is the actual 
change (if any) to address the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal commented on a change in pull request #15408: Add Connection Documentation to more Providers

2021-04-16 Thread GitBox


eladkal commented on a change in pull request #15408:
URL: https://github.com/apache/airflow/pull/15408#discussion_r615024146



##
File path: docs/apache-airflow-providers-snowflake/connections/snowflake.rst
##
@@ -0,0 +1,76 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+.. _howto/connection:snowflake:
+
+Snowflake Connection
+
+
+The Snowflake connection type enables integrations with Snowflake.
+
+Authenticating to Snowflake
+---
+
+Authenticate to Snowflake using the `Snowflake python connector default 
authentication
+`_.
+
+Default Connection IDs
+--
+
+Hooks, operators, and sensors related to Snowflake use ``snowflake_default`` 
by default.
+
+Configuring the Connection
+--
+
+Login
+Specify the snowflake username.
+
+Password
+Specify the snowflake password.
+
+Host (optional)
+Specify the snowflake hostname.
+
+Schema (optional)
+Specify the snowflake schema to be used.
+
+Extra (optional)
+Specify the extra parameters (as json dictionary) that can be used in the 
snowflake connection.
+The following parameters are all optional:
+
+* ``account``: Snowflake account name.
+* ``database``: Snowflake database name.
+* ``region``: Warehouse region.
+* ``warehouse``: Snowflake warehouse name.
+* ``role``: Snowflake role.
+* ``authenticator``: To connect using OAuth set this parameter ``oath``
+* ``private_key_file``: Specify the path to the private key file.
+* ``session_parameters``: Specify `session level parameters
+  
`_

Review comment:
   Is this section correct? See https://github.com/apache/airflow/pull/14724




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal edited a comment on issue #15407: ShortCircuitOperator to short circuit downstream of TaskGroup only

2021-04-16 Thread GitBox


eladkal edited a comment on issue #15407:
URL: https://github.com/apache/airflow/issues/15407#issuecomment-821334673


   `ShortCircuitOperator` cascades through everything by design 
https://github.com/apache/airflow/issues/7858 
   Since TaskGroup is more of a UI feature this is expected.
   However given your use case I can see how this functionality is missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] eladkal commented on issue #15407: ShortCircuitOperator to short circuit downstream of TaskGroup only

2021-04-16 Thread GitBox


eladkal commented on issue #15407:
URL: https://github.com/apache/airflow/issues/15407#issuecomment-821334673


   `ShortCircuitOperator` cascades through everything by design 
https://github.com/apache/airflow/issues/7858 
   Since TaskGroup is more of a UI feature this is expected.
   However since you are using TaskGroup as a replacement to SubDag I can see 
how this functionality is missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sosso edited a comment on issue #14205: Scheduler "deadlocks" itself when max_active_runs_per_dag is reached by up_for_retry tasks

2021-04-16 Thread GitBox


sosso edited a comment on issue #14205:
URL: https://github.com/apache/airflow/issues/14205#issuecomment-821326846


   +1 for us on this issue as well, I think?  Very strangely, we see the most 
recent run for a DAG have its *run* be set to 'running', but the only task in 
the DAG be a clear success:
   
   
![image](https://user-images.githubusercontent.com/619968/115061689-57345180-9e9e-11eb-88a4-41de2abf94d2.png)
   
   This is a catchup=False DAG, where the only task runs in a pool, and there 
is *nothing* in the Scheduler log for this DAG for two hours (the DAG runs is 
supposed to run every 5 minutes) about why it can't schedule this DAG.  No "max 
active runs reached", no "no slots available in pool", nothing.  It's like the 
scheduler forgot this DAG existed until we rebooted it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sosso commented on issue #14205: Scheduler "deadlocks" itself when max_active_runs_per_dag is reached by up_for_retry tasks

2021-04-16 Thread GitBox


sosso commented on issue #14205:
URL: https://github.com/apache/airflow/issues/14205#issuecomment-821326846


   +1 for us on this issue as well, I think?  Very strangely, we see the most 
recent run for a DAG have its *run* be set to 'running', but the only task in 
the DAG be a clear success:
   
   
![image](https://user-images.githubusercontent.com/619968/115061689-57345180-9e9e-11eb-88a4-41de2abf94d2.png)
   
   This is a catchup=False DAG, where the only task runs in a pool, and there 
is *nothing* in the Scheduler for this DAG for two hours (the DAG runs is 
supposed to run every 5 minutes) about why it can't schedule this DAG.  No "max 
active runs reached", no "no slots available in pool", nothing.  It's like the 
scheduler forgot this DAG existed until we rebooted it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] quoc-t-le commented on issue #15355: MysqlHook Utf8mb4

2021-04-16 Thread GitBox


quoc-t-le commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-821324755


   I was able to get Airflow 2 running...the problem is fixed there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch master updated (adbab36 -> cb1344b)

2021-04-16 Thread kaxilnaik
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from adbab36  Add changelog for what will become 2.0.2 (#15380)
 add cb1344b  Update azure connection documentation (#15352)

No new revisions were added by this update.

Summary of changes:
 airflow/providers/microsoft/azure/hooks/adx.py |  3 +-
 .../providers/microsoft/azure/hooks/azure_batch.py |  4 +
 .../azure/hooks/azure_container_instance.py|  6 +-
 .../azure/hooks/azure_container_registry.py|  5 +-
 .../azure/hooks/azure_container_volume.py  |  4 +-
 .../microsoft/azure/hooks/azure_cosmos.py  |  3 +-
 .../microsoft/azure/hooks/azure_data_factory.py|  4 +-
 .../microsoft/azure/hooks/azure_data_lake.py   |  2 +-
 .../microsoft/azure/hooks/azure_fileshare.py   |  2 +-
 .../providers/microsoft/azure/hooks/base_azure.py  |  5 +-
 airflow/providers/microsoft/azure/hooks/wasb.py|  2 +-
 .../microsoft/azure/operators/adls_delete.py   |  2 +
 .../microsoft/azure/operators/adls_list.py |  3 +-
 airflow/providers/microsoft/azure/operators/adx.py |  3 +-
 .../microsoft/azure/operators/azure_batch.py   |  2 +-
 .../azure/operators/azure_container_instances.py   |  3 +-
 .../microsoft/azure/operators/azure_cosmos.py  |  3 +-
 .../microsoft/azure/operators/wasb_delete_blob.py  |  2 +-
 .../microsoft/azure/sensors/azure_cosmos.py|  3 +-
 airflow/providers/microsoft/azure/sensors/wasb.py  |  2 +-
 .../connections/acr.rst| 62 ++
 .../connections/adf.rst| 72 
 .../connections/adl.rst| 70 
 .../connections/adx.rst| 96 ++
 .../connections/azure.rst  |  4 +-
 .../connections/azure_batch.rst| 65 +++
 .../connections/azure_cosmos.rst   | 66 +++
 .../connections/index.rst  |  0
 .../connections/wasb.rst   | 84 +++
 .../index.rst  |  2 +-
 30 files changed, 556 insertions(+), 28 deletions(-)
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/acr.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/adf.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/adl.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/adx.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/azure_batch.rst
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/azure_cosmos.rst
 copy docs/{apache-airflow-providers-google => 
apache-airflow-providers-microsoft-azure}/connections/index.rst (100%)
 create mode 100644 
docs/apache-airflow-providers-microsoft-azure/connections/wasb.rst


[GitHub] [airflow] kaxil merged pull request #15352: Update azure connection documentation

2021-04-16 Thread GitBox


kaxil merged pull request #15352:
URL: https://github.com/apache/airflow/pull/15352


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] blcksrx commented on issue #15332: SftpSensor w/ possibility to use RegEx or fnmatch

2021-04-16 Thread GitBox


blcksrx commented on issue #15332:
URL: https://github.com/apache/airflow/issues/15332#issuecomment-821319755


   it sounds for *nix OS that provides shell. it's convenient to use wildcards 
like this:
   ```
   hook.get_conn().execute("ls PATH/*.csv")
   ```
   but it is too raw and not useable for any cases. I'm going to prepare a PR 
for that to using regex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] houqp edited a comment on issue #14422: on_failure_callback does not seem to fire on pod deletion/eviction

2021-04-16 Thread GitBox


houqp edited a comment on issue #14422:
URL: https://github.com/apache/airflow/issues/14422#issuecomment-821296305


   Interesting, I was expecting the second SIGTERM would have resulted in the 
task subprocess to set its own state in through `handle_failure` because 
`self.on_kill` calls `self.task_runner.terminate()`, which is supposed to wait 
for the subprocess to exit:
   
   
https://github.com/apache/airflow/blob/e7c642ba2a79ea13d6ef84b78242f6c313cd3457/airflow/task/task_runner/standard_task_runner.py#L108-L117
   
   >  We should probably add self.task_instance.state=State.FAILED in 
handle_task_exit if exit_code != 1. WDYT @houqp @ephraimbuddy ?
   
   I think we should do this as an extra safe guard because in rare cases, the 
task sub process could crash any time after it received SIGTERM and before it 
updates its own task state. However, I think the state update logic should be 
set with an extra condition:
   
   ```python
   if self.task_instance.state not in State.finished():
   self.task_instance.state = State.FAILED
   ```
   
   This is to handle the case where the task could have exited successfully 
right after the pod is getting killed but before local task job executes 
`handle_task_exit`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sunkickr opened a new pull request #15408: Add Connection Documentation to more Providers

2021-04-16 Thread GitBox


sunkickr opened a new pull request #15408:
URL: https://github.com/apache/airflow/pull/15408


   This PR adds and updates documentation for connecting to some popular 
providers. It also adds links to this documentation in the doc strings of 
modules that use each connection. Documentation for the following connections 
is improved or updated:
   
   - ftp
   - sftp
   - ssh
   - snowflake
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] annotated tag 2.0.2rc1 updated (e494306 -> d28a705)

2021-04-16 Thread ash
This is an automated email from the ASF dual-hosted git repository.

ash pushed a change to annotated tag 2.0.2rc1
in repository https://gitbox.apache.org/repos/asf/airflow.git.


*** WARNING: tag 2.0.2rc1 was modified! ***

from e494306  (commit)
  to d28a705  (tag)
 tagging e494306fb01f3a026e7e2832ca94902e96b526fa (commit)
 replaces 2.0.1
  by Ash Berlin-Taylor
  on Fri Apr 16 17:20:44 2021 +0100

- Log -
Apache Airflow v2.0.2rc1

Bug Fixes
"

* Bugfix: ``TypeError`` when Serializing & sorting iterable properties of DAGs 
(#15395)
* Fix missing ``on_load`` trigger for folder-based plugins (#15208)
* ``kubernetes cleanup-pods`` subcommand will only clean up Airflow-created 
Pods (#15204)
* Fix password masking in CLI action_logging (#15143)
* Fix url generation for TriggerDagRunOperatorLink (#14990)
* Restore base lineage backend (#14146)
* Unable to trigger backfill or manual jobs with Kubernetes executor. (#14160)
* Bugfix: Task docs are not shown in the Task Instance Detail View (#15191)
* Bugfix: Fix overriding ``pod_template_file`` in KubernetesExecutor (#15197)
* Bugfix: resources in ``executor_config`` breaks Graph View in UI (#15199)
* Fix celery executor bug trying to call len on map (#14883)
* Fix bug in airflow.stats timing that broke dogstatsd mode (#15132)
* Avoid scheduler/parser manager deadlock by using non-blocking IO (#15112)
* Re-introduce ``dagrun.schedule_delay`` metric (#15105)
* Compare string values, not if strings are the same object in Kube 
executor(#14942)
* Pass queue to BaseExecutor.execute_async like in airflow 1.10 (#14861)
* Scheduler: Remove TIs from starved pools from the critical path. (#14476)
* Remove extra/needless deprecation warnings from airflow.contrib module 
(#15065)
* Fix support for long dag_id and task_id in KubernetesExecutor (#14703)
* Sort lists, sets and tuples in Serialized DAGs (#14909)
* Simplify cleaning string passed to origin param (#14738) (#14905)
* Fix error when running tasks with Sentry integration enabled. (#13929)
* Webserver: Sanitize string passed to origin param (#14738)
* Fix losing duration < 1 secs in tree (#13537)
* Pin SQLAlchemy to <1.4 due to breakage of sqlalchemy-utils (#14812)
* Fix KubernetesExecutor issue with deleted pending pods (#14810)
* Default to Celery Task model when backend model does not exist (#14612)
* Bugfix: Plugins endpoint was unauthenticated (#14570)
* BugFix: fix DAG doc display (especially for TaskFlow DAGs) (#14564)
* BugFix: TypeError in airflow.kubernetes.pod_launcher's monitor_pod (#14513)
* Bugfix: Fix wrong output of tags and owners in dag detail API endpoint 
(#14490)
* Fix logging error with task error when JSON logging is enabled (#14456)
* Fix statsd metrics not sending when using daemon mode (#14454)
* Gracefully handle missing start_date and end_date for DagRun (#14452)
* BugFix: Serialize max_retry_delay as a timedelta (#14436)
* Fix crash when user clicks on  "Task Instance Details" caused by start_date 
being None (#14416)
* BugFix: Fix TaskInstance API call fails if a task is removed from running DAG 
(#14381)
* Scheduler should not fail when invalid ``executor_config`` is passed (#14323)
* Fix bug allowing task instances to survive when dagrun_timeout is exceeded 
(#14321)
* Fix bug where DAG timezone was not always shown correctly in UI tooltips 
(#14204)
* Use ``Lax`` for ``cookie_samesite`` when empty string is passed (#14183)
* [AIRFLOW-6076] fix ``dag.cli()`` KeyError (#13647)
* Fix running child tasks in a subdag after clearing a successful subdag 
(#14776)

Improvements


* Remove unused JS packages causing false security alerts (#15383)
* Change default of ``[kubernetes] enable_tcp_keepalive`` for new installs to 
``True`` (#15338)
* Fixed #14270: Add error message in OOM situations (#15207)
* Better compatibility/diagnostics for arbitrary UID in docker image (#15162)
* Updates 3.6 limits for latest versions of a few libraries (#15209)
* Adds Blinker dependency which is missing after recent changes (#15182)
* Remove 'conf' from search_columns in DagRun View (#15099)
* More proper default value for namespace in K8S cleanup-pods CLI (#15060)
* Faster default role syncing during webserver start (#15017)
* Speed up webserver start when there are many DAGs (#14993)
* Much easier to use and better documented Docker image (#14911)
* Use ``libyaml`` C library when available. (#14577)
* Don't create unittest.cfg when not running in unit test mode (#14420)
* Webserver: Allow Filtering TaskInstances by queued_dttm (#14708)
* Update Flask-AppBuilder dependency to allow 3.2 (and all 3.x series) (#14665)
* Remember expanded task groups in browser local storage (#14661)
* Add plain format output to cli tables (#14546)
* Make ``airflow dags show`` command display TaskGroups (#14269)
* Increase maximum size of ``extra`` connection field. (#12944)
* Speed up clear_task_instances by doing a single sql de

[GitHub] [airflow] quoc-t-le opened a new issue #15407: ShortCircuitOperator to short circuit downstream of TaskGroup only

2021-04-16 Thread GitBox


quoc-t-le opened a new issue #15407:
URL: https://github.com/apache/airflow/issues/15407


   **Description**
   
   I am trying to convert a SubDag to a TaskGroup which uses the 
ShortCircuitOperator.  Originally, the short circuit will trip and move on to 
next SubDag.  I cannot get the same behavior with TaskGroup.
   
   What do you want to happen?
   
   If used ShortCircuitOperator within a TaskGroup, I would expect it to just 
short circuit the downstreams within the taskgroup.
   
   Here is the code I am trying to do.  I would expect T1 to be short circuited 
and execute T2:
   
   `from airflow import DAG
   from datetime import datetime
   from airflow.operators.dummy_operator import DummyOperator
   from airflow.operators.python_operator import ShortCircuitOperator, 
PythonOperator
   from airflow.utils.task_group import TaskGroup
   
   default_args = {
   'owner': 'airflow',
   'retries': 3,
   'depends_on_past': False,
   }
   
   def short_circuit (*arg, **kwargs):
   return False
   
   with DAG ("short-circuit",
 catchup=True,
 default_args=default_args,
 schedule_interval='@daily',
 description='Aggregates and pulls down data for API endpoints that 
use analytics',
 start_date=datetime.strptime('04/14/2021', '%m/%d/%Y'),
 max_active_runs=1
   ) as dag:
   t0 = DummyOperator(task_id='start')
   with TaskGroup('t1') as t1:
   s1 = ShortCircuitOperator(
   task_id='short_circuit',
   python_callable=short_circuit
   )
   s2 = DummyOperator(task_id='t1s2')
   s3 = DummyOperator(task_id='t1s3')
   s4 = DummyOperator(task_id='t1s4')
   s5 = DummyOperator(task_id='t1s5')
   s6 = DummyOperator(task_id='t1s6')
   s7 = DummyOperator(task_id='t1s7')
   s8 = DummyOperator(task_id='t1s8')
   s9 = DummyOperator(task_id='t1s9')
   s1 >> s2 >> s3 >> s4
   s4 >> s5
   s5 >> s6 >> s7
   s7 >> s8 >> s9
   
   
   with TaskGroup('t2') as t2:
   s1 = ShortCircuitOperator(
   task_id='short_circuit',
   python_callable=short_circuit,
   op_kwargs={"category": "mobile"}
   )
   s2 = DummyOperator(task_id='t1s2')
   s3 = DummyOperator(task_id='t1s3')
   s4 = DummyOperator(task_id='t1s4')
   s5 = DummyOperator(task_id='t1s5')
   s6 = DummyOperator(task_id='t1s6')
   s7 = DummyOperator(task_id='t1s7')
   s8 = DummyOperator(task_id='t1s8')
   s9 = DummyOperator(task_id='t1s9')
   s1 >> s2 >> s3 >> s4
   s4 >> s5
   s5 >> s6 >> s7
   s7 >> s8 >> s9
   
   t0 >> t1 >> t2
   `
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] Dr-Denzy commented on issue #13918: KubernetesPodOperator with pod_template_file = No Metadata & Wrong Pod Name

2021-04-16 Thread GitBox


Dr-Denzy commented on issue #13918:
URL: https://github.com/apache/airflow/issues/13918#issuecomment-821305671


   Using airflow 2.0.0 ... kubectl description of the `privileged-pod` gotten 
did not show task metadata (dag_id, task_id, etc).
   However, `airflow 2.0.1` with 
`apache-airflow-providers-cncf-kubernetes==1.0.2` yielded the right pod name 
but still did not show the task metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




svn commit: r47124 - /dev/airflow/2.0.2rc1/

2021-04-16 Thread ash
Author: ash
Date: Fri Apr 16 16:35:08 2021
New Revision: 47124

Log:
Add artefacts from Airflow 2.0.02rc1

Added:
dev/airflow/2.0.2rc1/
dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz   (with props)
dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz.asc   (with props)
dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz.sha512
dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz   (with props)
dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz.asc   (with 
props)
dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz.sha512
dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl   (with props)
dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl.asc   (with 
props)
dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl.sha512

Added: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz
==
Binary file - no diff available.

Propchange: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz
--
svn:mime-type = application/gzip

Added: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz.asc
==
Binary file - no diff available.

Propchange: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz.asc
--
svn:mime-type = application/pgp-signature

Added: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz.sha512
==
--- dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz.sha512 (added)
+++ dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-bin.tar.gz.sha512 Fri Apr 16 
16:35:08 2021
@@ -0,0 +1 @@
+4281b3ff5d5b483c74970f8128d7ad8ba699081086fd098e10b12f8b52a7d0f92a205d7ea334c29e813ac06af7a26de416294fd18c3a1a949388a4824955ce2e
  apache-airflow-2.0.2rc1-bin.tar.gz

Added: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz
==
Binary file - no diff available.

Propchange: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz
--
svn:mime-type = application/gzip

Added: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz.asc
==
Binary file - no diff available.

Propchange: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz.asc
--
svn:mime-type = application/pgp-signature

Added: dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz.sha512
==
--- dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz.sha512 (added)
+++ dev/airflow/2.0.2rc1/apache-airflow-2.0.2rc1-source.tar.gz.sha512 Fri Apr 
16 16:35:08 2021
@@ -0,0 +1 @@
+ca783369f9044796bc575bf18b986ac86998b007d01f8ff2a8c9635454d05f39fb09ce010d62249cf91badc83fd5b38c04f2b39e32830ccef70f601c5829dcb7
  apache-airflow-2.0.2rc1-source.tar.gz

Added: dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl
==
Binary file - no diff available.

Propchange: dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl
--
svn:mime-type = application/zip

Added: dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl.asc
==
Binary file - no diff available.

Propchange: dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl.asc
--
svn:mime-type = application/pgp-signature

Added: dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl.sha512
==
--- dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl.sha512 (added)
+++ dev/airflow/2.0.2rc1/apache_airflow-2.0.2rc1-py3-none-any.whl.sha512 Fri 
Apr 16 16:35:08 2021
@@ -0,0 +1 @@
+779563fd88256980ff8a994a9796d7fd18e579853c33d61e1603b084f4d150e83b3209bf1a9cd438c4dd08240b1ee48b139690ee208f80478b5b2465b7183e50
  apache_airflow-2.0.2rc1-py3-none-any.whl




[GitHub] [airflow] houqp commented on issue #14422: on_failure_callback does not seem to fire on pod deletion/eviction

2021-04-16 Thread GitBox


houqp commented on issue #14422:
URL: https://github.com/apache/airflow/issues/14422#issuecomment-821296305


   Interesting, I was expecting the second SIGTERM would have resulted in the 
task subprocess to set its own state in through `handle_failure` because 
`self.on_kill` calls `self.task_runner.terminate()`, which is supposed to wait 
for the subprocess to exit:
   
   
https://github.com/apache/airflow/blob/e7c642ba2a79ea13d6ef84b78242f6c313cd3457/airflow/task/task_runner/standard_task_runner.py#L108-L117
   
   >  We should probably add self.task_instance.state=State.FAILED in 
handle_task_exit if exit_code != 1. WDYT @houqp @ephraimbuddy ?
   
   I think we should do this as an extra safe guard because in rare cases, the 
task sub process could crash any time before it updates its own task state. 
However, I think the state update logic should be set with an extra condition:
   
   ```python
   if self.task_instance.state not in State.finished():
   self.task_instance.state = State.FAILED
   ```
   
   Because the task could have exited successfully right after the pod is 
getting killed but before local task job executes `handle_task_exit`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] houqp edited a comment on issue #14422: on_failure_callback does not seem to fire on pod deletion/eviction

2021-04-16 Thread GitBox


houqp edited a comment on issue #14422:
URL: https://github.com/apache/airflow/issues/14422#issuecomment-821296305


   Interesting, I was expecting the second SIGTERM would have resulted in the 
task subprocess to set its own state in through `handle_failure` because 
`self.on_kill` calls `self.task_runner.terminate()`, which is supposed to wait 
for the subprocess to exit:
   
   
https://github.com/apache/airflow/blob/e7c642ba2a79ea13d6ef84b78242f6c313cd3457/airflow/task/task_runner/standard_task_runner.py#L108-L117
   
   >  We should probably add self.task_instance.state=State.FAILED in 
handle_task_exit if exit_code != 1. WDYT @houqp @ephraimbuddy ?
   
   I think we should do this as an extra safe guard because in rare cases, the 
task sub process could crash any time before it updates its own task state. 
However, I think the state update logic should be set with an extra condition:
   
   ```python
   if self.task_instance.state not in State.finished():
   self.task_instance.state = State.FAILED
   ```
   
   This is to handle the case where the task could have exited successfully 
right after the pod is getting killed but before local task job executes 
`handle_task_exit`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] jedcunningham commented on issue #15399: Not scheduling since there are (negative number) open slots in pool

2021-04-16 Thread GitBox


jedcunningham commented on issue #15399:
URL: https://github.com/apache/airflow/issues/15399#issuecomment-821294376


   Yes, `QUEUED` should be included, this might help:
   
   
https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#task-lifecycle
   
   Basically, `QUEUED` means the executor has passed it along and is waiting 
for the worker to start/pick it up (depending on your executor).
   
   That said, negative open slots sure does seem like an issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] SalmonTimo commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


SalmonTimo commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821292708


   @minnieshi Do you have access to logs from the worker tasks? If so, can you 
include any errors there?
   
   @lukas-at-harren Well done! The second part of your proposed solution "The 
worker must fail properly (with its task ending in a "failed" state) when he 
cannot find the DAG + task he was tasked with" would apply to my CeleryExecutor 
problem as well, given my workers were timing out attempting to find their DAGs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] annotated tag constraints-2.0.2rc1 updated (4ce3900 -> 7c73284)

2021-04-16 Thread ash
This is an automated email from the ASF dual-hosted git repository.

ash pushed a change to annotated tag constraints-2.0.2rc1
in repository https://gitbox.apache.org/repos/asf/airflow.git.


*** WARNING: tag constraints-2.0.2rc1 was modified! ***

from 4ce3900  (commit)
  to 7c73284  (tag)
 tagging 4ce3900c940741d04657c86e6fb0f43077f3ff00 (commit)
 replaces constraints-2.0.1rc1
  by Ash Berlin-Taylor
  on Fri Apr 16 17:25:39 2021 +0100

- Log -
Constraints for 2.0.2rc1
-BEGIN PGP SIGNATURE-

iQJDBAABCAAtFiEEXMrqx1jtZMoyPwU7gHxzGoyCoJUFAmB5uowPHGFzaEBhcGFj
aGUub3JnAAoJEIB8cxqMgqCVySYQAIzJtnRGRuk8z1tclvarZle3lBDLfLnxiSfE
z/F0c7aSl+/mPzaNB99E8RERWxu6j+/t8o7FQyfvwUcLzO9i5ZsI6bMdbGop6fsc
YhIVf4VLGZ25kRkAeaIaLgWXTyUHhVEId1WaQ3qI7cJVIOXPLKE6u3z4RQ4iFeJw
gMJrEqhMxAxVX30VZ1a/Sr9WtqDrZrXyKRKJOmaz9kbBPy1faJVZj0k3mSc2S6vm
YYhjjkP71XxorpngFNnT97Wk8G4F+7yIPwjFp0f3ppezk0M2diWoZpy5iiNFCr17
TTLKVpVIp1FHdSB9BfqvCXVekzEfsiX7XdAEkJRWPlQ1us4QxLT+jgQIqYWQIMPV
39ggGz4aHN4pwO1qYBC2iCPA2F81As9Aw7mrnohT/jCyrxGcK3Gh3gDtNlatjISI
2ZxKfvrRj8os7EiCWwNUqYcIeFOraD4BAh0x0Nrq/ulbEu54fqxi9+cuR9dXIXqD
jaF0uL3QjBaNBl+qM9f1lRDd2gg3yOcAZIqEoBTd+ajEUhI2DJhtKQtIchVbFQcA
WjsfWaN/Kd+fmibKWQ2LZhZXgI3HP9kH7IlM5C0/MplyZj19+t2IG78tN4/eIyyf
9keOJCKL1bUGEo5EPgQCF+PtUF/VIEpQd+4R0thLlPLvtzDt4GV7CPqTyRrL2EDQ
PQ/NK20H
=bWgK
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:


[airflow] branch v2-0-stable updated (beb8af5 -> e494306)

2021-04-16 Thread ash
This is an automated email from the ASF dual-hosted git repository.

ash pushed a change to branch v2-0-stable
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from beb8af5  Update Changelog for 2.0.1rc2
 add 417d7cb  Fix spellings
 add 94ae38f  Restores flexible installation version, fixes manual tag 
build process. (#14107)
 add e743e03  Add script to generate integrations.json (#13073)
 add 0df89d8  Sort integrations.json by lowercase integration name (#13105)
 add f373032   Simplify Kerberos network setup (#13999)
 add 3abb230  Fix Kerberos network creation on older docker-compose (#14070)
 add 3f793d3  Add statsd integration to breeze (#12708)
 add 24a16cc  Run KinD tests when cncf.kubernetes provider files are 
changed (#14122)
 add 07d8926  Run CI (Build Image and committer PRs) on self-hosted runner 
(#13730)
 add c2b361d  Fix typo in Build Images workflow from self-hosted switch 
(#14150)
 add 35c7a45  Disable progress bar for PIP installation (#14126)
 add 8958a3d  Fixes regexp in entrypoint to include password-less entries 
(#14221)
 add 55d4774  Disables self-hosted runs for non-apache-owned repositories 
(#14239)
 add 24af3d2  fixup! Disables self-hosted runs for non-apache-owned 
repositories (#14239) (#14242)
 add ea98acb  Attempts to stabilize and improve speed of static checks 
(#14332)
 add 803c5eb  Implements generation of separate constraints for core and 
providers (#14227)
 add dae6003  Fix some tests failures after pylint fixes (#14350)
 add 2c3a9f7  Fix caching of python images during builds (#14347)
 add 119d31b  Easy switching between GitHub Container Registries (#14120)
 add 4197110  Pre-commit cache is tied to a specific python version (#14430)
 add d6f29af  Upgrade to newer dependencies only set when setup changed for 
PR (#14437)
 add 5f3d913  Add PATH to basic_static_checks. (#14451)
 add be82329  Fix pylint pre-commit checks when only todo files are changed 
(#14453)
 add 8301e20  Fixes date command in breeze build-image to work on MacOS 
(#14458)
 add 1761992  Adds --dry-run-docker flag to just print the docker commands 
(#14468)
 add fb967c0  Allow your own Docker production image to be verified by bash 
script (#14224)
 add efabde1  Removes the step to upload artifact with documentation 
(#14510)
 add ec82967  Update hadolint from v1.18.0 to v1.22.1 (#14509)
 add 2c6ee74  Production image can be run as root (#14226)
 add ef87e84  Fix asset recompilation message (#14532)
 add fbc675f  Fix typo in docker.rst (#14389)
 add 76356e5  Updates docs to include docker resource requirements for 
quickstart (#14464)
 add 543f36b  Enable LDAP auth in docker-compose.yaml (#14516)
 add 18a1042  Disable health checks for ad-hoc containers (#14536)
 add 247af49  Log all breeze output to a file automatically (#14470)
 add 76c249e  Fix breeze redirect on macOS (#14506)
 add 07d924d  Implement provider versioning tools (#13767)
 add 9621dda  Use DAG context manager in examples (#13297)
 add fe6f64e  Update documents for using MySQL (#14174)
 add 890976b  Add better description and guidance in case of sqlite version 
mismatch (#14209)
 add 52b70b9  Correct PostgreSQL password in doc example code (#14256)
 add 2977d20  Fix misleading statement on sqlite (#14317)
 add a11b678  Add more tips about health checks (#14537)
 add 47aa991  Add Neo4j hook and operator (#13324)
 add 4b73684  Minor doc fixes (#14547)
 add cfa4c7f  Fix grammar in production-deployment.rst (#14386)
 add 87e747f  Add Apache Beam operators (#12814)
 add 27f5175  Upgrade slack_sdk to v3 (#13745)
 add ede845b  Add Google Cloud Workflows Operators (#13366)
 add cca3afa  Update compatibility with google-cloud-os-login>=2.0.0 
(#13126)
 add fae6b2e  Support google-cloud-datacatalog>=1.0.0 (#13097)
 add dd3474c  Update compatibility with google-cloud-kms>=2.0 (#13124)
 add 6337aa8  Support google-cloud-pubsub>=2.0.0 (#13127)
 add f2b5637  Support google-cloud-redis>=2.0.0 (#13117)
 add 0fa5141  Add timeout option to gcs hook methods. (#13156)
 add 8947278  Support google-cloud-bigquery-datatransfer>=3.0.0 (#13337)
 add bc88c5b  Salesforce provider requires tableau (#13593)
 add 63f2bc4  Support google-cloud-datacatalog>=3.0.0 (#13534)
 add 1aa4871  Support google-cloud-automl >=2.1.0 (#13505)
 add 77cf7eb  Support google-cloud-tasks>=2.0.0 (#13347)
 add 98a7e75  Refactor DataprocOperators to support google-cloud-dataproc 
2.0 (#13256)
 add a074670  Support google-cloud-monitoring>=2.0.0 (#13769)
 add c2cb07f  Support google-cloud-logging` >=2.0.0 (#13801)
 add d7f607a  Update to Pytest 6.0 (#14065)
 add 1b29db9  Remove reinstalling azure-storage steps from CI / Breeze 
(#14102)
 add a83b596  Limits Sphinx to <3.5.0 (#14238)
 add b1acacb  Remove te

[GitHub] [airflow] ashb merged pull request #15406: Prepare 2.0.2

2021-04-16 Thread GitBox


ashb merged pull request #15406:
URL: https://github.com/apache/airflow/pull/15406


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] commented on pull request #15406: Prepare 2.0.2

2021-04-16 Thread GitBox


github-actions[bot] commented on pull request #15406:
URL: https://github.com/apache/airflow/pull/15406#issuecomment-821282239


   The PR most likely needs to run full matrix of tests because it modifies 
parts of the core of Airflow. However, committers might decide to merge it 
quickly and take the risk. If they don't merge it quickly - please rebase it to 
the latest master at your convenience, or amend the last commit of the PR, and 
push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - airflow 1.10.14
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value schedulerNumRuns is set  -1),  it decided 
to run 3 out of 4 tasks, and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
**stuck at the 'scheduled' state** after clear the task.
   - when checking log on scheduler, it has error like this
   `[2021-04-16 13:06:36,392] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 3497') Celery 
Task ID: ('_YYY_test', 'Task_blahblahblah', datetime.datetime(2021, 4, 
15, 3, 0, tzinfo=), 1)`
   
   - i reported here: https://github.com/helm/charts/issues/19399 but found the 
issue is already closed.
   
   **I tried to experiment the , which did not help as i expected.**
   - Uploaded the dag with new name/id. enabled, cleared the dag (otherwise the 
1st task just stuck at the 'queued' state)
   and 1st task is at the 'scheduled' state and stuck there.
   - check scheduler log:
   - `[2021-04-16 15:58:51,991] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 1851')
   Celery Task ID: ('XX_min_test_3', 'Load__to_', 
datetime.datetime(2021, 4, 15, 3, 0, tzinfo=), 1)
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 117, in fetch_celery_task_state
   res = (celery_task[0], celery_task[1].state)`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - airflow 1.10.14
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value schedulerNumRuns is set  -1),  it decided 
to run 3 out of 4 tasks, and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
**stuck at the 'scheduled' state** after clear the task.
   - when checking log on scheduler, it has error like this
   `[2021-04-16 13:06:36,392] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 3497') Celery 
Task ID: ('_YYY_test', 'Task_blahblahblah', datetime.datetime(2021, 4, 
15, 3, 0, tzinfo=), 1)`
   
   - i reported here: https://github.com/helm/charts/issues/19399 but found the 
issue is already closed.
   
   **I tried to experiment the , which did not help as i expected.**
   - Uploaded the dag with new name/id. enabled, cleared the dag (otherwise the 
1st task just stuck at the 'queued' state)
   and 1st task is at the 'scheduled' state and stuck there.
   - check scheduler log:
   - `[2021-04-16 15:58:51,991] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 1851')
   Celery Task ID: ('XX_min_test_3', 'Load_product_to_source', 
datetime.datetime(2021, 4, 15, 3, 0, tzinfo=), 1)
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 117, in fetch_celery_task_state
   res = (celery_task[0], celery_task[1].state)`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] xinbinhuang commented on pull request #13796: Fix S3ToFTPOperator

2021-04-16 Thread GitBox


xinbinhuang commented on pull request #13796:
URL: https://github.com/apache/airflow/pull/13796#issuecomment-821279946


   @JavierLopezT can you fix the pylint error thanks :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch v2-0-test updated (62b5835 -> e494306)

2021-04-16 Thread ash
This is an automated email from the ASF dual-hosted git repository.

ash pushed a change to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 62b5835  Add changelog for what will become 2.0.2 (#15380)
 add e494306  Update version to 2.0.2

No new revisions were added by this update.

Summary of changes:
 setup.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - airflow 1.10.14
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value schedulerNumRuns is set  -1),  it decided 
to run 3 out of 4 tasks, and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
**stuck at the 'scheduled' state** after clear the task.
   - when checking log on scheduler, it has error like this
   `[2021-04-16 13:06:36,392] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 3497') Celery 
Task ID: ('_YYY_test', 'Task_blahblahblah', datetime.datetime(2021, 4, 
15, 3, 0, tzinfo=), 1)`
   
   - i reported here: https://github.com/helm/charts/issues/19399 but found the 
issue is already closed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] thejens commented on a change in pull request #15367: Implement BigQuery Table Schema Patch Operator

2021-04-16 Thread GitBox


thejens commented on a change in pull request #15367:
URL: https://github.com/apache/airflow/pull/15367#discussion_r614945105



##
File path: airflow/providers/google/cloud/operators/bigquery.py
##
@@ -2039,6 +2039,155 @@ def execute(self, context) -> None:
 )
 
 
+class BigQueryPatchTableSchemaOperator(BaseOperator):
+"""
+Patch BigQuery Table Schema
+Updates fields on a table schema based on contents of the supplied schema
+parameter. The supplied schema does not need to be complete, if the field
+already exists in the schema you only need to supply a schema with the
+fields you want to patch and the "name" key set on the schema resource.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BigQueryPatchTableSchemaOperator`
+
+:param dataset_id: A dotted
+``(.|:)`` that indicates which dataset
+will be updated. (templated)
+:type dataset_id: str
+:param schema_fields: a partial schema resource. see
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableSchema
+
+**Example**: ::
+
+schema_fields=[
+{"name": "emp_name", "description": "Some New Description"},
+{"name": "salary", "description": "Some New Description"},
+{"name": "departments", "fields": [
+{"name": "name", "description": "Some New Description"},
+{"name": "type", "description": "Some New Description"}
+]},
+]
+
+:type schema_fields: dict
+:param project_id: The name of the project where we want to update the 
dataset.
+Don't need to provide, if projectId in dataset_reference.
+:type project_id: str
+:param gcp_conn_id: (Optional) The connection ID used to connect to Google 
Cloud.
+:type gcp_conn_id: str
+:param bigquery_conn_id: (Deprecated) The connection ID used to connect to 
Google Cloud.
+This parameter has been deprecated. You should pass the gcp_conn_id 
parameter instead.
+:type bigquery_conn_id: str
+:param delegate_to: The account to impersonate, if any.
+For this to work, the service account making the request must have 
domain-wide
+delegation enabled.
+:type delegate_to: str
+:param location: The location used for the operation.
+:type location: str
+:param impersonation_chain: Optional service account to impersonate using 
short-term
+credentials, or chained list of accounts required to get the 
access_token
+of the last account in the list, which will be impersonated in the 
request.
+If set as a string, the account must grant the originating account
+the Service Account Token Creator IAM role.
+If set as a sequence, the identities from the list must grant
+Service Account Token Creator IAM role to the directly preceding 
identity, with first
+account from the list granting this role to the originating account 
(templated).
+:type impersonation_chain: Union[str, Sequence[str]]
+"""
+
+template_fields = (
+'schema_fields',
+'dataset_id',
+'table_id',
+'project_id',
+'impersonation_chain',
+)
+template_fields_renderers = {"schema_fields": "json"}
+ui_color = BigQueryUIColors.TABLE.value
+
+@classmethod
+def _patch_schema(cls, old_schema: Dict, new_schema: Dict) -> None:
+"""
+Updates the content of "old_schema" with
+the fields from new_schema. Makes changes
+in place and hence has no return value.
+Works recursively for sub-records.
+Start by turning the schema list of fields into
+a dict keyed on the field name for both the old
+and the new schema.
+
+:param old_schema: Old schema which is updated in-place
+:type old_schema: dict
+:param new_schema: Partial schema definition used to patch old schema
+:type new_schema: dict
+"""
+new_fields = {field["name"]: field for field in new_schema["fields"] 
if "name" in field}
+old_fields = {field["name"]: field for field in old_schema["fields"]}
+
+# Iterate over all new fields and update the
+# old_schema dict
+for field_key in new_fields.keys():
+# Check if the field exists in the old_schema, if
+# so change it
+if field_key in old_fields:
+old_field = old_fields[field_key]
+new_field = new_fields[field_key]
+# Check if recursion is needed
+if "fields" in new_field:
+cls._patch_schema(old_field, new_field)
+del new_field["fields"]
+
+# Do the update
+old_field.update(new_field)
+
+# Field didn't exist, add it as a new field
+else:
+old_schema["fiel

[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value schedulerNumRuns is set  -1),  it decided 
to run 3 out of 4 tasks, and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
**stuck at the 'scheduled' state** after clear the task.
   - when checking log on scheduler, it has error like this
   `[2021-04-16 13:06:36,392] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 3497') Celery 
Task ID: ('_YYY_test', 'Task_blahblahblah', datetime.datetime(2021, 4, 
15, 3, 0, tzinfo=), 1)`
   
   - i reported here: https://github.com/helm/charts/issues/19399 but found the 
issue is already closed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] github-actions[bot] commented on pull request #15367: Implement BigQuery Table Schema Patch Operator

2021-04-16 Thread GitBox


github-actions[bot] commented on pull request #15367:
URL: https://github.com/apache/airflow/pull/15367#issuecomment-821268825


   [The Workflow run](https://github.com/apache/airflow/actions/runs/756152896) 
is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static 
checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm 
tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb opened a new pull request #15406: Prepare

2021-04-16 Thread GitBox


ashb opened a new pull request #15406:
URL: https://github.com/apache/airflow/pull/15406


   **DO NOT MERGE**
   
   For approval only.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch v2-0-test updated (a46e809 -> 62b5835)

2021-04-16 Thread ash
This is an automated email from the ASF dual-hosted git repository.

ash pushed a change to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from a46e809  Fixes pushing constraints (#15243)
 add 62b5835  Add changelog for what will become 2.0.2 (#15380)

No new revisions were added by this update.

Summary of changes:
 CHANGELOG.txt  | 97 +-
 UPDATING.md| 11 +-
 docs/spelling_wordlist.txt |  2 +
 3 files changed, 107 insertions(+), 3 deletions(-)


[GitHub] [airflow] ashb merged pull request #15380: Add changelog for what will become 2.0.2

2021-04-16 Thread GitBox


ashb merged pull request #15380:
URL: https://github.com/apache/airflow/pull/15380


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sunkickr commented on a change in pull request #15393: Add Connection Documentation for Popular Providers

2021-04-16 Thread GitBox


sunkickr commented on a change in pull request #15393:
URL: https://github.com/apache/airflow/pull/15393#discussion_r614938268



##
File path: airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py
##
@@ -34,8 +34,9 @@ class SparkKubernetesOperator(BaseOperator):
 :type application_file:  str
 :param namespace: kubernetes namespace to put sparkApplication
 :type namespace: str
-:param kubernetes_conn_id: the connection to Kubernetes cluster
-:type kubernetes_conn_id: str
+:param conn_id: The :ref:`kubernetes 
connection`

Review comment:
   Thank you for catching these!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch master updated (54edbaa -> adbab36)

2021-04-16 Thread ash
This is an automated email from the ASF dual-hosted git repository.

ash pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 54edbaa  Share app instance between Kerberos tests (#15141)
 add adbab36  Add changelog for what will become 2.0.2 (#15380)

No new revisions were added by this update.

Summary of changes:
 CHANGELOG.txt  | 97 +-
 UPDATING.md|  3 ++
 docs/spelling_wordlist.txt |  2 +
 3 files changed, 101 insertions(+), 1 deletion(-)


[GitHub] [airflow] minnieshi edited a comment on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value -1),  it decided to run 3 out of 4 tasks, 
and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
**stuck at the 'scheduled' state** after clear the task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] thejens commented on a change in pull request #15367: Implement BigQuery Table Schema Patch Operator

2021-04-16 Thread GitBox


thejens commented on a change in pull request #15367:
URL: https://github.com/apache/airflow/pull/15367#discussion_r614930112



##
File path: airflow/providers/google/cloud/operators/bigquery.py
##
@@ -2039,6 +2039,155 @@ def execute(self, context) -> None:
 )
 
 
+class BigQueryPatchTableSchemaOperator(BaseOperator):
+"""
+Patch BigQuery Table Schema
+Updates fields on a table schema based on contents of the supplied schema
+parameter. The supplied schema does not need to be complete, if the field
+already exists in the schema you only need to supply a schema with the
+fields you want to patch and the "name" key set on the schema resource.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BigQueryPatchTableSchemaOperator`
+
+:param dataset_id: A dotted
+``(.|:)`` that indicates which dataset
+will be updated. (templated)
+:type dataset_id: str
+:param schema_fields: a partial schema resource. see
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableSchema
+
+**Example**: ::
+
+schema_fields=[
+{"name": "emp_name", "description": "Some New Description"},
+{"name": "salary", "description": "Some New Description"},
+{"name": "departments", "fields": [
+{"name": "name", "description": "Some New Description"},
+{"name": "type", "description": "Some New Description"}
+]},
+]
+
+:type schema_fields: dict
+:param project_id: The name of the project where we want to update the 
dataset.
+Don't need to provide, if projectId in dataset_reference.
+:type project_id: str
+:param gcp_conn_id: (Optional) The connection ID used to connect to Google 
Cloud.
+:type gcp_conn_id: str
+:param bigquery_conn_id: (Deprecated) The connection ID used to connect to 
Google Cloud.
+This parameter has been deprecated. You should pass the gcp_conn_id 
parameter instead.
+:type bigquery_conn_id: str
+:param delegate_to: The account to impersonate, if any.
+For this to work, the service account making the request must have 
domain-wide
+delegation enabled.
+:type delegate_to: str
+:param location: The location used for the operation.
+:type location: str
+:param impersonation_chain: Optional service account to impersonate using 
short-term
+credentials, or chained list of accounts required to get the 
access_token
+of the last account in the list, which will be impersonated in the 
request.
+If set as a string, the account must grant the originating account
+the Service Account Token Creator IAM role.
+If set as a sequence, the identities from the list must grant
+Service Account Token Creator IAM role to the directly preceding 
identity, with first
+account from the list granting this role to the originating account 
(templated).
+:type impersonation_chain: Union[str, Sequence[str]]
+"""
+
+template_fields = (
+'schema_fields',
+'dataset_id',
+'table_id',
+'project_id',
+'impersonation_chain',
+)
+template_fields_renderers = {"schema_fields": "json"}
+ui_color = BigQueryUIColors.TABLE.value
+
+@classmethod
+def _patch_schema(cls, old_schema: Dict, new_schema: Dict) -> None:
+"""
+Updates the content of "old_schema" with
+the fields from new_schema. Makes changes
+in place and hence has no return value.
+Works recursively for sub-records.
+Start by turning the schema list of fields into
+a dict keyed on the field name for both the old
+and the new schema.
+
+:param old_schema: Old schema which is updated in-place
+:type old_schema: dict
+:param new_schema: Partial schema definition used to patch old schema
+:type new_schema: dict
+"""
+new_fields = {field["name"]: field for field in new_schema["fields"] 
if "name" in field}
+old_fields = {field["name"]: field for field in old_schema["fields"]}
+
+# Iterate over all new fields and update the
+# old_schema dict
+for field_key in new_fields.keys():
+# Check if the field exists in the old_schema, if
+# so change it
+if field_key in old_fields:
+old_field = old_fields[field_key]
+new_field = new_fields[field_key]
+# Check if recursion is needed
+if "fields" in new_field:
+cls._patch_schema(old_field, new_field)
+del new_field["fields"]
+
+# Do the update
+old_field.update(new_field)
+
+# Field didn't exist, add it as a new field
+else:
+old_schema["fiel

[airflow] branch constraints-2-0 updated: Updating constraints. Build id:754833502

2021-04-16 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch constraints-2-0
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/constraints-2-0 by this push:
 new 4ce3900  Updating constraints. Build id:754833502
4ce3900 is described below

commit 4ce3900c940741d04657c86e6fb0f43077f3ff00
Author: Automated GitHub Actions commit 
AuthorDate: Fri Apr 16 15:31:04 2021 +

Updating constraints. Build id:754833502

This update in constraints is automatically committed by the CI 
'constraints-push' step based on
HEAD of 'refs/heads/v2-0-test' in 'apache/airflow'
with commit sha a46e809731241caa7dba5e983e8677ca7e539e79.

All tests passed in this build so we determined we can push the updated 
constraints.

See 
https://github.com/apache/airflow/blob/master/README.md#installing-from-pypi 
for details.
---
 constraints-3.6.txt  | 79 +
 constraints-3.7.txt  | 85 ++-
 constraints-3.8.txt  | 86 +++-
 constraints-no-providers-3.6.txt | 12 ++---
 constraints-no-providers-3.7.txt | 18 
 constraints-no-providers-3.8.txt | 17 +++
 constraints-source-providers-3.6.txt |  1 +
 constraints-source-providers-3.7.txt |  1 +
 constraints-source-providers-3.8.txt |  2 +
 9 files changed, 158 insertions(+), 143 deletions(-)

diff --git a/constraints-3.6.txt b/constraints-3.6.txt
index 9dd7c86..4b6cfac 100644
--- a/constraints-3.6.txt
+++ b/constraints-3.6.txt
@@ -2,7 +2,7 @@
 APScheduler==3.6.3
 Authlib==0.15.3
 Babel==2.9.0
-Flask-AppBuilder==3.2.1
+Flask-AppBuilder==3.2.2
 Flask-Babel==1.0.0
 Flask-Bcrypt==0.7.1
 Flask-Caching==1.10.1
@@ -28,7 +28,7 @@ PySmbClient==0.1.5
 PyYAML==5.4.1
 Pygments==2.8.1
 SQLAlchemy-JSONField==1.0.0
-SQLAlchemy-Utils==0.36.8
+SQLAlchemy-Utils==0.37.0
 SQLAlchemy==1.3.24
 Sphinx==3.4.3
 Unidecode==1.2.0
@@ -41,48 +41,49 @@ alembic==1.5.8
 amqp==2.6.1
 analytics-python==1.2.9
 ansiwrap==0.8.4
-apache-airflow-providers-amazon==1.2.0
+apache-airflow-providers-airbyte==1.0.0
+apache-airflow-providers-amazon==1.3.0
 apache-airflow-providers-apache-beam==1.0.1
 apache-airflow-providers-apache-cassandra==1.0.1
 apache-airflow-providers-apache-druid==1.1.0
 apache-airflow-providers-apache-hdfs==1.0.1
-apache-airflow-providers-apache-hive==1.0.2
+apache-airflow-providers-apache-hive==1.0.3
 apache-airflow-providers-apache-kylin==1.0.1
-apache-airflow-providers-apache-livy==1.0.1
+apache-airflow-providers-apache-livy==1.1.0
 apache-airflow-providers-apache-pig==1.0.1
 apache-airflow-providers-apache-pinot==1.0.1
 apache-airflow-providers-apache-spark==1.0.2
 apache-airflow-providers-apache-sqoop==1.0.1
 apache-airflow-providers-celery==1.0.1
 apache-airflow-providers-cloudant==1.0.1
-apache-airflow-providers-cncf-kubernetes==1.0.2
+apache-airflow-providers-cncf-kubernetes==1.1.0
 apache-airflow-providers-databricks==1.0.1
 apache-airflow-providers-datadog==1.0.1
 apache-airflow-providers-dingding==1.0.2
 apache-airflow-providers-discord==1.0.1
-apache-airflow-providers-docker==1.0.2
+apache-airflow-providers-docker==1.1.0
 apache-airflow-providers-elasticsearch==1.0.3
 apache-airflow-providers-exasol==1.1.1
-apache-airflow-providers-facebook==1.0.1
+apache-airflow-providers-facebook==1.1.0
 apache-airflow-providers-ftp==1.0.1
-apache-airflow-providers-google==2.1.0
-apache-airflow-providers-grpc==1.0.1
-apache-airflow-providers-hashicorp==1.0.1
+apache-airflow-providers-google==2.2.0
+apache-airflow-providers-grpc==1.1.0
+apache-airflow-providers-hashicorp==1.0.2
 apache-airflow-providers-http==1.1.1
 apache-airflow-providers-imap==1.0.1
 apache-airflow-providers-jdbc==1.0.1
 apache-airflow-providers-jenkins==1.1.0
 apache-airflow-providers-jira==1.0.1
-apache-airflow-providers-microsoft-azure==1.2.0
+apache-airflow-providers-microsoft-azure==1.3.0
 apache-airflow-providers-microsoft-mssql==1.0.1
-apache-airflow-providers-microsoft-winrm==1.0.1
+apache-airflow-providers-microsoft-winrm==1.1.0
 apache-airflow-providers-mongo==1.0.1
-apache-airflow-providers-mysql==1.0.2
+apache-airflow-providers-mysql==1.1.0
 apache-airflow-providers-neo4j==1.0.1
 apache-airflow-providers-odbc==1.0.1
 apache-airflow-providers-openfaas==1.1.1
-apache-airflow-providers-opsgenie==1.0.1
-apache-airflow-providers-oracle==1.0.1
+apache-airflow-providers-opsgenie==1.0.2
+apache-airflow-providers-oracle==1.1.0
 apache-airflow-providers-pagerduty==1.0.1
 apache-airflow-providers-papermill==1.0.2
 apache-airflow-providers-plexus==1.0.1
@@ -90,18 +91,19 @@ apache-airflow-providers-postgres==1.0.1
 apache-airflow-providers-presto==1.0.2
 apache-airflow-providers-qubole==1.0.2
 apache-airflow-providers-redis==1.0.1
-apache-airflow-providers-salesforce==1.0.1
+apache-airflow-providers-salesforce==2.0.0
 apache-airflow-providers-s

[GitHub] [airflow] thejens commented on a change in pull request #15367: Implement BigQuery Table Schema Patch Operator

2021-04-16 Thread GitBox


thejens commented on a change in pull request #15367:
URL: https://github.com/apache/airflow/pull/15367#discussion_r614929411



##
File path: airflow/providers/google/cloud/operators/bigquery.py
##
@@ -2039,6 +2039,155 @@ def execute(self, context) -> None:
 )
 
 
+class BigQueryPatchTableSchemaOperator(BaseOperator):
+"""
+Patch BigQuery Table Schema
+Updates fields on a table schema based on contents of the supplied schema
+parameter. The supplied schema does not need to be complete, if the field
+already exists in the schema you only need to supply a schema with the
+fields you want to patch and the "name" key set on the schema resource.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BigQueryPatchTableSchemaOperator`
+
+:param dataset_id: A dotted
+``(.|:)`` that indicates which dataset
+will be updated. (templated)
+:type dataset_id: str
+:param schema_fields: a partial schema resource. see
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableSchema
+
+**Example**: ::
+
+schema_fields=[
+{"name": "emp_name", "description": "Some New Description"},
+{"name": "salary", "description": "Some New Description"},
+{"name": "departments", "fields": [
+{"name": "name", "description": "Some New Description"},
+{"name": "type", "description": "Some New Description"}
+]},
+]
+
+:type schema_fields: dict
+:param project_id: The name of the project where we want to update the 
dataset.
+Don't need to provide, if projectId in dataset_reference.
+:type project_id: str
+:param gcp_conn_id: (Optional) The connection ID used to connect to Google 
Cloud.
+:type gcp_conn_id: str
+:param bigquery_conn_id: (Deprecated) The connection ID used to connect to 
Google Cloud.
+This parameter has been deprecated. You should pass the gcp_conn_id 
parameter instead.
+:type bigquery_conn_id: str
+:param delegate_to: The account to impersonate, if any.
+For this to work, the service account making the request must have 
domain-wide
+delegation enabled.
+:type delegate_to: str
+:param location: The location used for the operation.
+:type location: str
+:param impersonation_chain: Optional service account to impersonate using 
short-term
+credentials, or chained list of accounts required to get the 
access_token
+of the last account in the list, which will be impersonated in the 
request.
+If set as a string, the account must grant the originating account
+the Service Account Token Creator IAM role.
+If set as a sequence, the identities from the list must grant
+Service Account Token Creator IAM role to the directly preceding 
identity, with first
+account from the list granting this role to the originating account 
(templated).
+:type impersonation_chain: Union[str, Sequence[str]]
+"""
+
+template_fields = (
+'schema_fields',
+'dataset_id',
+'table_id',
+'project_id',
+'impersonation_chain',
+)
+template_fields_renderers = {"schema_fields": "json"}
+ui_color = BigQueryUIColors.TABLE.value
+
+@classmethod
+def _patch_schema(cls, old_schema: Dict, new_schema: Dict) -> None:
+"""
+Updates the content of "old_schema" with
+the fields from new_schema. Makes changes
+in place and hence has no return value.
+Works recursively for sub-records.
+Start by turning the schema list of fields into
+a dict keyed on the field name for both the old
+and the new schema.
+
+:param old_schema: Old schema which is updated in-place
+:type old_schema: dict
+:param new_schema: Partial schema definition used to patch old schema
+:type new_schema: dict
+"""
+new_fields = {field["name"]: field for field in new_schema["fields"] 
if "name" in field}
+old_fields = {field["name"]: field for field in old_schema["fields"]}
+
+# Iterate over all new fields and update the
+# old_schema dict
+for field_key in new_fields.keys():
+# Check if the field exists in the old_schema, if
+# so change it
+if field_key in old_fields:
+old_field = old_fields[field_key]
+new_field = new_fields[field_key]
+# Check if recursion is needed
+if "fields" in new_field:
+cls._patch_schema(old_field, new_field)
+del new_field["fields"]
+
+# Do the update
+old_field.update(new_field)
+
+# Field didn't exist, add it as a new field
+else:
+old_schema["fiel

[GitHub] [airflow] thejens commented on a change in pull request #15367: Implement BigQuery Table Schema Patch Operator

2021-04-16 Thread GitBox


thejens commented on a change in pull request #15367:
URL: https://github.com/apache/airflow/pull/15367#discussion_r614928939



##
File path: airflow/providers/google/cloud/operators/bigquery.py
##
@@ -2039,6 +2039,149 @@ def execute(self, context) -> None:
 )
 
 
+class BigQueryPatchTableSchemaOperator(BaseOperator):
+"""
+Patch BigQuery Table Schema
+Updates fields on a table schema based on contents of the supplied schema
+parameter. The supplied schema does not need to be complete, if the field
+already exists in the schema you only need to supply a schema with the
+fields you want to patch and the "name" key set on the schema resource.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BigQueryPatchTableSchemaOperator`
+
+:param dataset_id: A dotted
+``(.|:)`` that indicates which dataset
+will be updated. (templated)
+:type dataset_id: str
+:param schema_fields: a partial schema resource. see
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableSchema
+
+**Example**: ::
+
+schema_fields=[
+{"name": "emp_name", "description": "Some New Description"},
+{"name": "salary", "description": "Some New Description"},
+{"name": "departments", "fields": [
+{"name": "name", "description": "Some New Description"},
+{"name": "type", "description": "Some New Description"}
+]},
+]
+
+:type schema_fields: dict
+:param project_id: The name of the project where we want to update the 
dataset.
+Don't need to provide, if projectId in dataset_reference.
+:type project_id: str
+:param gcp_conn_id: (Optional) The connection ID used to connect to Google 
Cloud.
+:type gcp_conn_id: str
+:param bigquery_conn_id: (Deprecated) The connection ID used to connect to 
Google Cloud.
+This parameter has been deprecated. You should pass the gcp_conn_id 
parameter instead.
+:type bigquery_conn_id: str
+:param delegate_to: The account to impersonate, if any.
+For this to work, the service account making the request must have 
domain-wide
+delegation enabled.
+:type delegate_to: str
+:param location: The location used for the operation.
+:type location: str
+:param impersonation_chain: Optional service account to impersonate using 
short-term
+credentials, or chained list of accounts required to get the 
access_token
+of the last account in the list, which will be impersonated in the 
request.
+If set as a string, the account must grant the originating account
+the Service Account Token Creator IAM role.
+If set as a sequence, the identities from the list must grant
+Service Account Token Creator IAM role to the directly preceding 
identity, with first
+account from the list granting this role to the originating account 
(templated).
+:type impersonation_chain: Union[str, Sequence[str]]
+"""
+
+template_fields = (
+'schema_fields',
+'dataset_id',
+'table_id',
+'project_id',
+'impersonation_chain',
+)
+template_fields_renderers = {"schema_fields": "json"}
+ui_color = BigQueryUIColors.TABLE.value
+
+@classmethod
+def _patch_schema(cls, old_schema, new_schema):
+# Updates the content of "old_schema" with
+# the fields from new_schema. Makes changes
+# in place and hence has no return value.
+# Works recursively for sub-records
+
+# Start by turning the schema list of fields into
+# a dict keyed on the field name for both the old
+# and the new schema
+new_fields = {field["name"]: field for field in new_schema["fields"] 
if "name" in field}
+old_fields = {field["name"]: field for field in old_schema["fields"]}
+
+# Iterate over all new fields and update the
+# old_schema dict
+for field_key in new_fields.keys():
+# Check if the field exists in the old_schema, if
+# so change it
+if field_key in old_fields:
+old_field = old_fields[field_key]
+new_field = new_fields[field_key]
+# Check if recursion is needed
+if "fields" in new_field:
+cls._patch_schema(old_field, new_field)
+del new_field["fields"]
+
+# Do the update
+old_field.update(new_field)
+
+# Field didn't exist, add it as a new field
+else:
+old_schema["fields"].append(new_fields[field_key])
+
+@apply_defaults
+def __init__(
+self,
+*,
+dataset_id: Optional[str] = None,
+schema_fields: List[Dict[str, Any]],
+table_id: Optional[str] = None,
+   

[GitHub] [airflow] minnieshi commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

2021-04-16 Thread GitBox


minnieshi commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - celery executor
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value -1),  it decided to run 3 out of 4 tasks, 
and the last one just stuck at the queued
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
stuck at the 'scheduled' state after clear the task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >