[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life
[ https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kozikowski updated AIRFLOW-515: -- Description: h1. Symptoms Triggering a dag on the web node via the trigger_dag does not start it. Trigger the same dag worked a moment before and I did not apply any config changes in the mean time, so it is likely not an issue with the dag setup. Very likely some deadlock on scheduler node deadlocked the whole system. Triggering the dag (airflow trigger_dag on the web node) only adds it to postgres table dag_run. The run does not appear in the flower or the main web gui. System was stuck for 50 minutes without any signs of life, and no logs coming from either scheduler, worker or webserver. In the end, system got unblocked after force-restarting scheduler and worker. Is there any way I should health-check my scheduler to avoid similar situation in production? h1. Recent logs Logs were collected at 19:10, the system was stuck for almost an hour. h2. scheduler {noformat} /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. 'strategies for improved performance.' % expr) [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the executor's heartbeat [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without a recent heartbeat [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat after 2016-09-19 18:13:26.189579 [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 19, 18, 15, 35, 200082)) through celery, queue=default [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs {noformat} h2. web {noformat} [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags {noformat} h2. worker {noformat} Logging into: /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082 {noformat} was: h1. Symptoms Triggering a dag on the web node via the trigger_dag does not start it. Trigger the same dag worked a moment before and I did not apply any config changes in the mean time. Triggering the dag (airflow trigger_dag on the web node) only adds it to postgres table dag_run. The run does not appear in the flower or the main web gui. System was stuck for 50 minutes without any signs of life, and no logs coming from either scheduler, worker or webserver. In the end, system got unblocked after force-restarting scheduler and worker. Is there any way I should health-check my scheduler to avoid similar situation in production? h1. Recent logs Logs were collected at 19:10, the system was stuck for almost an hour. h2. scheduler {noformat} /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. 'strategies for improved performance.' % expr) [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the executor's heartbeat [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without a recent heartbeat [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat after 2016-09-19 18:13:26.189579 [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 19, 18, 15, 35, 200082)) through celery, queue=default [2016-09-19 18:15:45,817]
[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life
[ https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kozikowski updated AIRFLOW-515: -- Description: h1. Symptoms Triggering a dag on the web node via the trigger_dag does not start it. Trigger the same dag worked a moment before and I did not apply any config changes in the mean time. Triggering the dag only adds it to postgres table dag_run. The run does not appear in the flower or the main web gui. System was stuck for 50 minutes without any signs of life, and no logs coming from either scheduler, worker or webserver. In the end, system got unblocked after force-restarting scheduler and worker. Is there any way I should health-check my scheduler to avoid similar situation in production? h1. Recent logs Logs were collected at 19:10, the system was stuck for almost an hour. h2. scheduler {noformat} /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. 'strategies for improved performance.' % expr) [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the executor's heartbeat [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without a recent heartbeat [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat after 2016-09-19 18:13:26.189579 [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 19, 18, 15, 35, 200082)) through celery, queue=default [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs {noformat} h2. web {noformat} [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags {noformat} h2. worker {noformat} Logging into: /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082 {noformat} was: h1. Symptoms Triggering a dag on the web node via the trigger_dag does not start it. Same dag worked previously. Triggering the dag only adds it to postgres table dag_run. The run does not appear in the flower or the main web gui. System was stuck for 50 minutes without any signs of life, and no logs coming from either scheduler, worker or webserver. In the end, system got unblocked after force-restarting scheduler and worker. Is there any way I should health-check my scheduler to avoid similar situation in production? h1. Recent logs Logs were collected at 19:10, the system was stuck for almost an hour. h2. scheduler {noformat} /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. 'strategies for improved performance.' % expr) [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the executor's heartbeat [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without a recent heartbeat [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat after 2016-09-19 18:13:26.189579 [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 19, 18, 15, 35, 200082)) through celery, queue=default [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs {noformat} h2. web {noformat} [2016-09-19 18:10:12,213]
[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life
[ https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kozikowski updated AIRFLOW-515: -- Description: h1. Symptoms Triggering a dag on the web node via the trigger_dag does not start it. Same dag worked previously. Triggering the dag only adds it to postgres table dag_run. The run does not appear in the flower or the main web gui. System was stuck for 50 minutes without any signs of life, and no logs coming from either scheduler, worker or webserver. In the end, system got unblocked after force-restarting scheduler and worker. Is there any way I should health-check my scheduler to avoid similar situation in production? h1. Recent logs Logs were collected at 19:10, the system was stuck for almost an hour. h2. scheduler {noformat} /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. 'strategies for improved performance.' % expr) [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the executor's heartbeat [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without a recent heartbeat [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat after 2016-09-19 18:13:26.189579 [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 19, 18, 15, 35, 200082)) through celery, queue=default [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs {noformat} h2. web {noformat} [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags {noformat} h2. worker {noformat} Logging into: /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082 {noformat} was: h1. Symptoms Triggering a dag on the web node via the trigger_dag does not start it. Same dag worked previously and it is still enabled. Trigger the dag only adds it to postgres database dag_run, but does not have effect otherwise. It does not appear in the flower. Only restarting schedluler and worker helped. h1. Recent logs Logs were collected at 19:10, the system was stuck for almost an hour. h2. scheduler {noformat} /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. 'strategies for improved performance.' % expr) [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the executor's heartbeat [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without a recent heartbeat [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat after 2016-09-19 18:13:26.189579 [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 19, 18, 15, 35, 200082)) through celery, queue=default [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs {noformat} h2. web {noformat} [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor
[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life
[ https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kozikowski updated AIRFLOW-515: -- Environment: airflow==1.7.1.3 Dockerfile: https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 kubernetes config (running within minikube): https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 was: airflow==1.7.1.3 Dockerfile: https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 Local config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 > Airflow system is stuck without signs of life > - > > Key: AIRFLOW-515 > URL: https://issues.apache.org/jira/browse/AIRFLOW-515 > Project: Apache Airflow > Issue Type: Bug > Environment: airflow==1.7.1.3 Dockerfile: > https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 > kubernetes config (running within minikube): > https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 >Reporter: Robert Kozikowski > > h1. Symptoms > Airflow system is stuck without response. I trigger a dag on the web node via > the trigger_dag. Same dag worked previously. > Trigger the dag only adds it to postgres database dag_run, but does not have > effect otherwise. It does not appear in the flower. > Only restarting schedluler and worker helped. > h1. Recent logs > Logs were collected at 19:10, the system was stuck for almost an hour. > h2. scheduler > {noformat} > /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: > SAWarning: The IN-predicate on "task_instance.execution_date" was invoked > with an empty sequence. This results in a contradiction, which nonetheless > can be expensive to evaluate. Consider alternative strategies for improved > performance. > 'strategies for improved performance.' % expr) > [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling > the executor's heartbeat > [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds > [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs > without a recent heartbeat > [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without > heartbeat after 2016-09-19 18:13:26.189579 > [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing > ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, > 9, 19, 18, 15, 35, 200082)) through celery, queue=default > [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs > [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs > {noformat} > h2. web > {noformat} > [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > {noformat} > h2. worker > {noformat} > Logging into: > /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life
[ https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kozikowski updated AIRFLOW-515: -- Environment: airflow==1.7.1.3 Dockerfile: https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 Local config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 was: airflow==1.7.1.3 My airflow Dockerfile: https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 Airflow running locally in kubernetes minikube, with config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 . > Airflow system is stuck without signs of life > - > > Key: AIRFLOW-515 > URL: https://issues.apache.org/jira/browse/AIRFLOW-515 > Project: Apache Airflow > Issue Type: Bug > Environment: airflow==1.7.1.3 Dockerfile: > https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 > Local config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 >Reporter: Robert Kozikowski > > h1. Symptoms > Airflow system is stuck without response. I trigger a dag on the web node via > the trigger_dag. Same dag worked previously. > Trigger the dag only adds it to postgres database dag_run, but does not have > effect otherwise. It does not appear in the flower. > Only restarting schedluler and worker helped. > h1. Recent logs > Logs were collected at 19:10, the system was stuck for almost an hour. > h2. scheduler > {noformat} > /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: > SAWarning: The IN-predicate on "task_instance.execution_date" was invoked > with an empty sequence. This results in a contradiction, which nonetheless > can be expensive to evaluate. Consider alternative strategies for improved > performance. > 'strategies for improved performance.' % expr) > [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling > the executor's heartbeat > [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds > [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs > without a recent heartbeat > [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without > heartbeat after 2016-09-19 18:13:26.189579 > [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing > ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, > 9, 19, 18, 15, 35, 200082)) through celery, queue=default > [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs > [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs > {noformat} > h2. web > {noformat} > [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > {noformat} > h2. worker > {noformat} > Logging into: > /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-515) Airflow system is stuck without signs of life
Robert Kozikowski created AIRFLOW-515: - Summary: Airflow system is stuck without signs of life Key: AIRFLOW-515 URL: https://issues.apache.org/jira/browse/AIRFLOW-515 Project: Apache Airflow Issue Type: Bug Environment: airflow==1.7.1.3 My airflow Dockerfile: https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 Airflow running locally in kubernetes minikube, with config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 . Reporter: Robert Kozikowski h1. Symptoms Airflow system is stuck without response. I trigger a dag on the web node via the trigger_dag. Same dag worked previously. Trigger the dag only adds it to postgres database dag_run, but does not have effect otherwise. It does not appear in the flower. Only restarting schedluler and worker helped. h1. Recent logs Logs were collected at 19:10, the system was stuck for almost an hour. h2. scheduler {noformat} /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. 'strategies for improved performance.' % expr) [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the executor's heartbeat [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without a recent heartbeat [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat after 2016-09-19 18:13:26.189579 [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 19, 18, 15, 35, 200082)) through celery, queue=default [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs {noformat} h2. web {noformat} [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags {noformat} h2. worker {noformat} Logging into: /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life
[ https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kozikowski updated AIRFLOW-515: -- Environment: airflow==1.7.1.3 My airflow Dockerfile: https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 Airflow running locally in kubernetes minikube, with config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 . was: airflow==1.7.1.3 My airflow Dockerfile: https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 Airflow running locally in kubernetes minikube, with config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 . > Airflow system is stuck without signs of life > - > > Key: AIRFLOW-515 > URL: https://issues.apache.org/jira/browse/AIRFLOW-515 > Project: Apache Airflow > Issue Type: Bug > Environment: airflow==1.7.1.3 > My airflow Dockerfile: > https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51 > Airflow running locally in kubernetes minikube, with config > https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 . >Reporter: Robert Kozikowski > > h1. Symptoms > Airflow system is stuck without response. I trigger a dag on the web node via > the trigger_dag. Same dag worked previously. > Trigger the dag only adds it to postgres database dag_run, but does not have > effect otherwise. It does not appear in the flower. > Only restarting schedluler and worker helped. > h1. Recent logs > Logs were collected at 19:10, the system was stuck for almost an hour. > h2. scheduler > {noformat} > /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: > SAWarning: The IN-predicate on "task_instance.execution_date" was invoked > with an empty sequence. This results in a contradiction, which nonetheless > can be expensive to evaluate. Consider alternative strategies for improved > performance. > 'strategies for improved performance.' % expr) > [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling > the executor's heartbeat > [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds > [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs > without a recent heartbeat > [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without > heartbeat after 2016-09-19 18:13:26.189579 > [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing > ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, > 9, 19, 18, 15, 35, 200082)) through celery, queue=default > [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs > [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs > {noformat} > h2. web > {noformat} > [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor > CeleryExecutor > [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from > /usr/local/airflow/dags > {noformat} > h2. worker > {noformat} > Logging into: > /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)