[jira] [Commented] (AIRFLOW-6556) Improving unclear and incomplete documentation
[ https://issues.apache.org/jira/browse/AIRFLOW-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016746#comment-17016746 ] Jacob Ward commented on AIRFLOW-6556: - Absolutely! > Improving unclear and incomplete documentation > -- > > Key: AIRFLOW-6556 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6556 > Project: Apache Airflow > Issue Type: Improvement > Components: documentation >Affects Versions: master >Reporter: Jacob Ward >Assignee: Jarek Potiuk >Priority: Trivial > > To help improve documentation it was discussed in the mailing list that users > of Airflow should have somewhere to report missing, incomplete or unclear > documentation. Any users who find this should comment on this ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (AIRFLOW-6556) Improving unclear and incomplete documentation
[ https://issues.apache.org/jira/browse/AIRFLOW-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016155#comment-17016155 ] Jacob Ward edited comment on AIRFLOW-6556 at 1/15/20 5:09 PM: -- Here's a list of issues myself and my team have noticed. I'll add more as I notice them. * [Kubernetes|https://airflow.apache.org/docs/stable/kubernetes.html]: ** An explanation of how the KubernetesExecutor (KE) and KubernetesPodOperator (KPO) work. ** An explanation of the difference between the KE & KPO (I see this question and confusion arise a lot in slack). ** An explanation of exactly what the [kubernetes] config options are used for with the KE (something that I have only just started to understand is that these config options mostly apply to the workers, since the scheduler does not _have_ to run inside the cluster with KE). ** (Minor) The KPO hyperlink on the Kubernetes doc page sends you to the KE page. Also the links on this page send you to readthedocs.io * Some information [(possibly here)|https://airflow.apache.org/docs/stable/concepts.html?highlight=trigger%20rule#dag-assignment] about how assigning operators to dags after creation means the default_args aren't applied. * In general I think the Concepts/Core Ideas section has a lot of interesting and useful information, but since it is one long page it makes it harder to find the information you need. I think this would be better split into several pages with more information and examples for each concept. * Some information on best practice, how to avoid common pitfalls, etc. * [LocalExecutor is missing|https://airflow.apache.org/docs/stable/executor/index.html] and some general information on the architecture and how these Executors are supposed to work would be useful. I often see new users getting confused by what the Executor really is, and get this and operators mixed up at times. * [Security|https://airflow.apache.org/docs/stable/security.html#viewer] - What all the permissions are, and what does each one actually mean? E.g. 'can_show' is not descriptive enough to know exactly what it means. * It may be covered by Kaxil's updates to the config documentation, but a page on how new dags are picked up, parsed and stored, and what the DagBag is and it interacts with/or is interacted with by the scheduler/webserver/db. was (Author: jward): Here's a list of issues myself and my team have noticed. I'll add more as I notice them. * [Kubernetes|https://airflow.apache.org/docs/stable/kubernetes.html]: ** An explanation of how the KubernetesExecutor (KE) and KubernetesPodOperator (KPO) work. ** An explanation of the difference between the KE & KPO (I see this question and confusion arise a lot in slack). ** An explanation of exactly what the [kubernetes] config options are used for with the KE (something that I have only just started to understand is that these config options mostly apply to the workers, since the scheduler does not _have_ to run inside the cluster with KE). ** (Minor) The KPO hyperlink on the Kubernetes doc page sends you to the KE page. Also the links on this page send you to readthedocs.io * Some information [(possibly here)|https://airflow.apache.org/docs/stable/concepts.html?highlight=trigger%20rule#dag-assignment] about how assigning operators to dags after creation means the default_args aren't applied. * In general I think the Concepts/Core Ideas section has a lot of interesting and useful information, but since it is one long page it makes it harder to find the information you need. I think this would be better split into several pages with more information and examples for each concept. * Some information on best practice, how to avoid common pitfalls, etc. * [LocalExecutor is missing|https://airflow.apache.org/docs/stable/executor/index.html] and some general information on the architecture and how these Executors are supposed to work would be useful. I often see new users getting confused by what the Executor really is, and get this and operators mixed up at times. * [Security|https://airflow.apache.org/docs/stable/security.html#viewer] - What all the permissions are, and what does each one actually mean? E.g. 'can_show' is not descriptive enough to know exactly what it means. > Improving unclear and incomplete documentation > -- > > Key: AIRFLOW-6556 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6556 > Project: Apache Airflow > Issue Type: Improvement > Components: documentation >Affects Versions: master >Reporter: Jacob Ward >Assignee: Jarek Potiuk >Priority: Trivial > > To help improve documentation it was discussed in the mailing list that users > of Airflow should have somewhere to report missing, incomplete or
[jira] [Commented] (AIRFLOW-6556) Improving unclear and incomplete documentation
[ https://issues.apache.org/jira/browse/AIRFLOW-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016155#comment-17016155 ] Jacob Ward commented on AIRFLOW-6556: - Here's a list of issues myself and my team have noticed. I'll add more as I notice them. * [Kubernetes|https://airflow.apache.org/docs/stable/kubernetes.html]: ** An explanation of how the KubernetesExecutor (KE) and KubernetesPodOperator (KPO) work. ** An explanation of the difference between the KE & KPO (I see this question and confusion arise a lot in slack). ** An explanation of exactly what the [kubernetes] config options are used for with the KE (something that I have only just started to understand is that these config options mostly apply to the workers, since the scheduler does not _have_ to run inside the cluster with KE). ** (Minor) The KPO hyperlink on the Kubernetes doc page sends you to the KE page. Also the links on this page send you to readthedocs.io * Some information [(possibly here)|https://airflow.apache.org/docs/stable/concepts.html?highlight=trigger%20rule#dag-assignment] about how assigning operators to dags after creation means the default_args aren't applied. * In general I think the Concepts/Core Ideas section has a lot of interesting and useful information, but since it is one long page it makes it harder to find the information you need. I think this would be better split into several pages with more information and examples for each concept. * Some information on best practice, how to avoid common pitfalls, etc. * [LocalExecutor is missing|https://airflow.apache.org/docs/stable/executor/index.html] and some general information on the architecture and how these Executors are supposed to work would be useful. I often see new users getting confused by what the Executor really is, and get this and operators mixed up at times. * [Security|https://airflow.apache.org/docs/stable/security.html#viewer] - What all the permissions are, and what does each one actually mean? E.g. 'can_show' is not descriptive enough to know exactly what it means. > Improving unclear and incomplete documentation > -- > > Key: AIRFLOW-6556 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6556 > Project: Apache Airflow > Issue Type: Improvement > Components: documentation >Affects Versions: master >Reporter: Jacob Ward >Assignee: Jarek Potiuk >Priority: Trivial > > To help improve documentation it was discussed in the mailing list that users > of Airflow should have somewhere to report missing, incomplete or unclear > documentation. Any users who find this should comment on this ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-6556) Improving unclear and incomplete documentation
Jacob Ward created AIRFLOW-6556: --- Summary: Improving unclear and incomplete documentation Key: AIRFLOW-6556 URL: https://issues.apache.org/jira/browse/AIRFLOW-6556 Project: Apache Airflow Issue Type: Improvement Components: documentation Affects Versions: master Reporter: Jacob Ward Assignee: Jarek Potiuk To help improve documentation it was discussed in the mailing list that users of Airflow should have somewhere to report missing, incomplete or unclear documentation. Any users who find this should comment on this ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (AIRFLOW-5859) Tasks locking and heartbeat warnings
[ https://issues.apache.org/jira/browse/AIRFLOW-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Ward closed AIRFLOW-5859. --- Resolution: Invalid > Tasks locking and heartbeat warnings > > > Key: AIRFLOW-5859 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5859 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.6 > Environment: Airflow using LocalExecutor and Postgres >Reporter: Jacob Ward >Priority: Major > > Having two potentially related issues. > Issue 1: > Some of my tasks (has only been PythonOperators so far) are starting and then > doing nothing. I had a task that usually executes in 10-30 minutes running > for over 24hrs without any error messages in the logs (other than the > heartbeat warnings shown below) and without failing. > So far this has only happened to tasks inside sub-dags, not sure if that's to > do with it? > The logs for the sub-dag show: > {code:} > [2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run > 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: > 1 | deadlocked: 0 | not ready: 7 > [2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:34,830] {local_task_job.py:124} WARNING - Time since last > heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s > [2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run > 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: > 1 | deadlocked: 0 | not ready: 7 > [2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:39,859] {local_task_job.py:124} WARNING - Time since last > heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s > {code} > repeated during that 24hr period. > Issue 2: > In all of the logs this warning message is printed every 5 seconds: > {code:} > [2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 > 14:25:29,465] {local_task_job.py:124} WARNING - Time since last > heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5859) Tasks locking and heartbeat warnings
[ https://issues.apache.org/jira/browse/AIRFLOW-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974088#comment-16974088 ] Jacob Ward commented on AIRFLOW-5859: - I'm still having this issue, it weirdly seems to be happening with specific tasks. I have investigated one and it appears the python code has executed successfully but Airflow still has the task in the `Running` state. > Tasks locking and heartbeat warnings > > > Key: AIRFLOW-5859 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5859 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.6 > Environment: Airflow using LocalExecutor and Postgres >Reporter: Jacob Ward >Priority: Major > > Having two potentially related issues. > Issue 1: > Some of my tasks (has only been PythonOperators so far) are starting and then > doing nothing. I had a task that usually executes in 10-30 minutes running > for over 24hrs without any error messages in the logs (other than the > heartbeat warnings shown below) and without failing. > So far this has only happened to tasks inside sub-dags, not sure if that's to > do with it? > The logs for the sub-dag show: > {code:} > [2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run > 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: > 1 | deadlocked: 0 | not ready: 7 > [2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:34,830] {local_task_job.py:124} WARNING - Time since last > heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s > [2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run > 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: > 1 | deadlocked: 0 | not ready: 7 > [2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:39,859] {local_task_job.py:124} WARNING - Time since last > heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s > {code} > repeated during that 24hr period. > Issue 2: > In all of the logs this warning message is printed every 5 seconds: > {code:} > [2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 > 14:25:29,465] {local_task_job.py:124} WARNING - Time since last > heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (AIRFLOW-5859) Tasks locking and heartbeat warnings
[ https://issues.apache.org/jira/browse/AIRFLOW-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Ward updated AIRFLOW-5859: Description: Having two potentially related issues. Issue 1: Some of my tasks (has only been PythonOperators so far) are starting and then doing nothing. I had a task that usually executes in 10-30 minutes running for over 24hrs without any error messages in the logs (other than the heartbeat warnings shown below) and without failing. So far this has only happened to tasks inside sub-dags, not sure if that's to do with it? The logs for the sub-dag show: {code:} [2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7 [2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,830] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s [2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7 [2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,859] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s {code} repeated during that 24hr period. Issue 2: In all of the logs this warning message is printed every 5 seconds: {code:} [2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 14:25:29,465] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code} was: Having two potentially related issues. Issue 1: Some of my tasks (has only been PythonOperators so far) are starting and then doing nothing. I had a task that usually executes in 10-30 minutes running for over 24hrs without any error messages in the logs (other than the heartbeat warnings shown below) and without failing. So far this has only happened to tasks inside sub-dags, not sure if that's to do with it? The logs for the sub-dag show: {code:java} [2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7 [2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,830] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s [2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7 [2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,859] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s {code} repeated during that 24hr period. Issue 2: In all of the logs this warning message is printed every 5 seconds: {{}} {code:java} [2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 14:25:29,465] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code} {{}} > Tasks locking and heartbeat warnings > > > Key: AIRFLOW-5859 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5859 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.6 > Environment: Airflow using LocalExecutor and Postgres >Reporter: Jacob Ward >Priority: Major > > Having two potentially related issues. > Issue 1: > Some of my tasks (has only been PythonOperators so far) are starting and then > doing nothing. I had a task that usually executes in 10-30 minutes running > for over 24hrs without any error messages in the logs (other than the > heartbeat warnings shown below) and without failing. > So far this has only happened to tasks inside sub-dags, not sure if that's to > do with it? > The logs for the sub-dag show: > {code:} > [2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run > 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: > 1 | deadlocked: 0 | not ready: 7 > [2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 > 16:41:34,830] {local_task_job.py:124} WARNING - Time since last >
[jira] [Created] (AIRFLOW-5859) Tasks locking and heartbeat warnings
Jacob Ward created AIRFLOW-5859: --- Summary: Tasks locking and heartbeat warnings Key: AIRFLOW-5859 URL: https://issues.apache.org/jira/browse/AIRFLOW-5859 Project: Apache Airflow Issue Type: Bug Components: DagRun Affects Versions: 1.10.6 Environment: Airflow using LocalExecutor and Postgres Reporter: Jacob Ward Having two potentially related issues. Issue 1: Some of my tasks (has only been PythonOperators so far) are starting and then doing nothing. I had a task that usually executes in 10-30 minutes running for over 24hrs without any error messages in the logs (other than the heartbeat warnings shown below) and without failing. So far this has only happened to tasks inside sub-dags, not sure if that's to do with it? The logs for the sub-dag show: {code:java} [2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7 [2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,830] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s [2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7 [2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,859] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s {code} repeated during that 24hr period. Issue 2: In all of the logs this warning message is printed every 5 seconds: {{}} {code:java} [2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 14:25:29,465] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code} {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)