[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life

2016-09-19 Thread Robert Kozikowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kozikowski updated AIRFLOW-515:
--
Description: 
h1. Symptoms
Triggering a dag on the web node via the trigger_dag does not start it. Trigger 
the same dag worked a moment before and I did not apply any config changes in 
the mean time, so it is likely not an issue with the dag setup. Very likely 
some deadlock on scheduler node deadlocked the whole system. 

Triggering the dag (airflow trigger_dag on the web node) only adds it to 
postgres table dag_run. The run does not appear in the flower or the main web 
gui. System was stuck for 50 minutes without any signs of life, and no logs 
coming from either scheduler, worker or webserver.

In the end, system got unblocked after force-restarting scheduler and worker. 
Is there any way I should health-check my scheduler to avoid similar situation 
in production?  

h1. Recent logs
Logs were collected at 19:10, the system was stuck for almost an hour.
h2. scheduler
{noformat}
/usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with 
an empty sequence. This results in a contradiction, which nonetheless can be 
expensive to evaluate.  Consider alternative strategies for improved 
performance.
  'strategies for improved performance.' % expr)
[2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
[2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat 
after 2016-09-19 18:13:26.189579
[2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 
19, 18, 15, 35, 200082)) through celery, queue=default
[2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
[2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
{noformat}
h2. web
{noformat}
[2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
{noformat}
h2. worker
{noformat}
Logging into: 
/usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082
{noformat}


  was:
h1. Symptoms
Triggering a dag on the web node via the trigger_dag does not start it. Trigger 
the same dag worked a moment before and I did not apply any config changes in 
the mean time.

Triggering the dag (airflow trigger_dag on the web node) only adds it to 
postgres table dag_run. The run does not appear in the flower or the main web 
gui. System was stuck for 50 minutes without any signs of life, and no logs 
coming from either scheduler, worker or webserver.

In the end, system got unblocked after force-restarting scheduler and worker. 
Is there any way I should health-check my scheduler to avoid similar situation 
in production?  

h1. Recent logs
Logs were collected at 19:10, the system was stuck for almost an hour.
h2. scheduler
{noformat}
/usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with 
an empty sequence. This results in a contradiction, which nonetheless can be 
expensive to evaluate.  Consider alternative strategies for improved 
performance.
  'strategies for improved performance.' % expr)
[2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
[2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat 
after 2016-09-19 18:13:26.189579
[2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 
19, 18, 15, 35, 200082)) through celery, queue=default
[2016-09-19 18:15:45,817] 

[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life

2016-09-19 Thread Robert Kozikowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kozikowski updated AIRFLOW-515:
--
Description: 
h1. Symptoms
Triggering a dag on the web node via the trigger_dag does not start it. Trigger 
the same dag worked a moment before and I did not apply any config changes in 
the mean time.

Triggering the dag only adds it to postgres table dag_run. The run does not 
appear in the flower or the main web gui. System was stuck for 50 minutes 
without any signs of life, and no logs coming from either scheduler, worker or 
webserver.

In the end, system got unblocked after force-restarting scheduler and worker. 
Is there any way I should health-check my scheduler to avoid similar situation 
in production?  

h1. Recent logs
Logs were collected at 19:10, the system was stuck for almost an hour.
h2. scheduler
{noformat}
/usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with 
an empty sequence. This results in a contradiction, which nonetheless can be 
expensive to evaluate.  Consider alternative strategies for improved 
performance.
  'strategies for improved performance.' % expr)
[2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
[2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat 
after 2016-09-19 18:13:26.189579
[2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 
19, 18, 15, 35, 200082)) through celery, queue=default
[2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
[2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
{noformat}
h2. web
{noformat}
[2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
{noformat}
h2. worker
{noformat}
Logging into: 
/usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082
{noformat}


  was:
h1. Symptoms
Triggering a dag on the web node via the trigger_dag does not start it. Same 
dag worked previously.

Triggering the dag only adds it to postgres table dag_run. The run does not 
appear in the flower or the main web gui. System was stuck for 50 minutes 
without any signs of life, and no logs coming from either scheduler, worker or 
webserver.

In the end, system got unblocked after force-restarting scheduler and worker. 
Is there any way I should health-check my scheduler to avoid similar situation 
in production?  

h1. Recent logs
Logs were collected at 19:10, the system was stuck for almost an hour.
h2. scheduler
{noformat}
/usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with 
an empty sequence. This results in a contradiction, which nonetheless can be 
expensive to evaluate.  Consider alternative strategies for improved 
performance.
  'strategies for improved performance.' % expr)
[2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
[2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat 
after 2016-09-19 18:13:26.189579
[2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 
19, 18, 15, 35, 200082)) through celery, queue=default
[2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
[2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
{noformat}
h2. web
{noformat}
[2016-09-19 18:10:12,213] 

[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life

2016-09-19 Thread Robert Kozikowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kozikowski updated AIRFLOW-515:
--
Description: 
h1. Symptoms
Triggering a dag on the web node via the trigger_dag does not start it. Same 
dag worked previously.

Triggering the dag only adds it to postgres table dag_run. The run does not 
appear in the flower or the main web gui. System was stuck for 50 minutes 
without any signs of life, and no logs coming from either scheduler, worker or 
webserver.

In the end, system got unblocked after force-restarting scheduler and worker. 
Is there any way I should health-check my scheduler to avoid similar situation 
in production?  

h1. Recent logs
Logs were collected at 19:10, the system was stuck for almost an hour.
h2. scheduler
{noformat}
/usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with 
an empty sequence. This results in a contradiction, which nonetheless can be 
expensive to evaluate.  Consider alternative strategies for improved 
performance.
  'strategies for improved performance.' % expr)
[2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
[2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat 
after 2016-09-19 18:13:26.189579
[2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 
19, 18, 15, 35, 200082)) through celery, queue=default
[2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
[2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
{noformat}
h2. web
{noformat}
[2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
{noformat}
h2. worker
{noformat}
Logging into: 
/usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082
{noformat}


  was:
h1. Symptoms
Triggering a dag on the web node via the trigger_dag does not start it. Same 
dag worked previously and it is still enabled.

Trigger the dag only adds it to postgres database dag_run, but does not have 
effect otherwise. It does not appear in the flower. 

Only restarting schedluler and worker helped.
h1. Recent logs
Logs were collected at 19:10, the system was stuck for almost an hour.
h2. scheduler
{noformat}
/usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with 
an empty sequence. This results in a contradiction, which nonetheless can be 
expensive to evaluate.  Consider alternative strategies for improved 
performance.
  'strategies for improved performance.' % expr)
[2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
[2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat 
after 2016-09-19 18:13:26.189579
[2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 
19, 18, 15, 35, 200082)) through celery, queue=default
[2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
[2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
{noformat}
h2. web
{noformat}
[2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor

[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life

2016-09-19 Thread Robert Kozikowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kozikowski updated AIRFLOW-515:
--
Environment: 
airflow==1.7.1.3 Dockerfile: 
https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
kubernetes config (running within minikube): 
https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274

  was:
airflow==1.7.1.3 Dockerfile: 
https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
Local config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274


> Airflow system is stuck without signs of life
> -
>
> Key: AIRFLOW-515
> URL: https://issues.apache.org/jira/browse/AIRFLOW-515
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: airflow==1.7.1.3 Dockerfile: 
> https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
> kubernetes config (running within minikube): 
> https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274
>Reporter: Robert Kozikowski
>
> h1. Symptoms
> Airflow system is stuck without response. I trigger a dag on the web node via 
> the trigger_dag. Same dag worked previously.
> Trigger the dag only adds it to postgres database dag_run, but does not have 
> effect otherwise. It does not appear in the flower. 
> Only restarting schedluler and worker helped.
> h1. Recent logs
> Logs were collected at 19:10, the system was stuck for almost an hour.
> h2. scheduler
> {noformat}
> /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
> SAWarning: The IN-predicate on "task_instance.execution_date" was invoked 
> with an empty sequence. This results in a contradiction, which nonetheless 
> can be expensive to evaluate.  Consider alternative strategies for improved 
> performance.
>   'strategies for improved performance.' % expr)
> [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling 
> the executor's heartbeat
> [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
> [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs 
> without a recent heartbeat
> [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without 
> heartbeat after 2016-09-19 18:13:26.189579
> [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
> ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 
> 9, 19, 18, 15, 35, 200082)) through celery, queue=default
> [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
> [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
> {noformat}
> h2. web
> {noformat}
> [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> {noformat}
> h2. worker
> {noformat}
> Logging into: 
> /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life

2016-09-19 Thread Robert Kozikowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kozikowski updated AIRFLOW-515:
--
Environment: 
airflow==1.7.1.3 Dockerfile: 
https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
Local config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274

  was:
airflow==1.7.1.3

My airflow Dockerfile: 
https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51

Airflow running locally in kubernetes minikube, with config 
https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 .


> Airflow system is stuck without signs of life
> -
>
> Key: AIRFLOW-515
> URL: https://issues.apache.org/jira/browse/AIRFLOW-515
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: airflow==1.7.1.3 Dockerfile: 
> https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
> Local config https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274
>Reporter: Robert Kozikowski
>
> h1. Symptoms
> Airflow system is stuck without response. I trigger a dag on the web node via 
> the trigger_dag. Same dag worked previously.
> Trigger the dag only adds it to postgres database dag_run, but does not have 
> effect otherwise. It does not appear in the flower. 
> Only restarting schedluler and worker helped.
> h1. Recent logs
> Logs were collected at 19:10, the system was stuck for almost an hour.
> h2. scheduler
> {noformat}
> /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
> SAWarning: The IN-predicate on "task_instance.execution_date" was invoked 
> with an empty sequence. This results in a contradiction, which nonetheless 
> can be expensive to evaluate.  Consider alternative strategies for improved 
> performance.
>   'strategies for improved performance.' % expr)
> [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling 
> the executor's heartbeat
> [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
> [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs 
> without a recent heartbeat
> [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without 
> heartbeat after 2016-09-19 18:13:26.189579
> [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
> ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 
> 9, 19, 18, 15, 35, 200082)) through celery, queue=default
> [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
> [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
> {noformat}
> h2. web
> {noformat}
> [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> {noformat}
> h2. worker
> {noformat}
> Logging into: 
> /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-515) Airflow system is stuck without signs of life

2016-09-19 Thread Robert Kozikowski (JIRA)
Robert Kozikowski created AIRFLOW-515:
-

 Summary: Airflow system is stuck without signs of life
 Key: AIRFLOW-515
 URL: https://issues.apache.org/jira/browse/AIRFLOW-515
 Project: Apache Airflow
  Issue Type: Bug
 Environment: airflow==1.7.1.3
My airflow Dockerfile: 
https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
Airflow running locally in kubernetes minikube, with config 
https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 .
Reporter: Robert Kozikowski


h1. Symptoms
Airflow system is stuck without response. I trigger a dag on the web node via 
the trigger_dag. Same dag worked previously.

Trigger the dag only adds it to postgres database dag_run, but does not have 
effect otherwise. It does not appear in the flower. 

Only restarting schedluler and worker helped.
h1. Recent logs
Logs were collected at 19:10, the system was stuck for almost an hour.
h2. scheduler
{noformat}
/usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
SAWarning: The IN-predicate on "task_instance.execution_date" was invoked with 
an empty sequence. This results in a contradiction, which nonetheless can be 
expensive to evaluate.  Consider alternative strategies for improved 
performance.
  'strategies for improved performance.' % expr)
[2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
[2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without heartbeat 
after 2016-09-19 18:13:26.189579
[2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 9, 
19, 18, 15, 35, 200082)) through celery, queue=default
[2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
[2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
{noformat}
h2. web
{noformat}
[2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor CeleryExecutor
[2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
[2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from 
/usr/local/airflow/dags
{noformat}
h2. worker
{noformat}
Logging into: 
/usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-515) Airflow system is stuck without signs of life

2016-09-19 Thread Robert Kozikowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kozikowski updated AIRFLOW-515:
--
Environment: 
airflow==1.7.1.3

My airflow Dockerfile: 
https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51

Airflow running locally in kubernetes minikube, with config 
https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 .

  was:
airflow==1.7.1.3
My airflow Dockerfile: 
https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
Airflow running locally in kubernetes minikube, with config 
https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 .


> Airflow system is stuck without signs of life
> -
>
> Key: AIRFLOW-515
> URL: https://issues.apache.org/jira/browse/AIRFLOW-515
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: airflow==1.7.1.3
> My airflow Dockerfile: 
> https://gist.github.com/kozikow/9290c44f03fba8d0f1299f355ef96b51
> Airflow running locally in kubernetes minikube, with config 
> https://gist.github.com/kozikow/49c812f11d1e4fd43e465e8090619274 .
>Reporter: Robert Kozikowski
>
> h1. Symptoms
> Airflow system is stuck without response. I trigger a dag on the web node via 
> the trigger_dag. Same dag worked previously.
> Trigger the dag only adds it to postgres database dag_run, but does not have 
> effect otherwise. It does not appear in the flower. 
> Only restarting schedluler and worker helped.
> h1. Recent logs
> Logs were collected at 19:10, the system was stuck for almost an hour.
> h2. scheduler
> {noformat}
> /usr/lib/python3.5/site-packages/sqlalchemy/sql/default_comparator.py:153: 
> SAWarning: The IN-predicate on "task_instance.execution_date" was invoked 
> with an empty sequence. This results in a contradiction, which nonetheless 
> can be expensive to evaluate.  Consider alternative strategies for improved 
> performance.
>   'strategies for improved performance.' % expr)
> [2016-09-19 18:15:41,171] {jobs.py:741} INFO - Done queuing tasks, calling 
> the executor's heartbeat
> [2016-09-19 18:15:41,172] {jobs.py:744} INFO - Loop took: 0.37502 seconds
> [2016-09-19 18:15:41,189] {models.py:305} INFO - Finding 'running' jobs 
> without a recent heartbeat
> [2016-09-19 18:15:41,189] {models.py:311} INFO - Failing jobs without 
> heartbeat after 2016-09-19 18:13:26.189579
> [2016-09-19 18:15:41,212] {celery_executor.py:64} INFO - [celery] queuing 
> ('repos_to_process_dag', 'update_repos_to_process', datetime.datetime(2016, 
> 9, 19, 18, 15, 35, 200082)) through celery, queue=default
> [2016-09-19 18:15:45,817] {jobs.py:574} INFO - Prioritizing 0 queued jobs
> [2016-09-19 18:15:45,831] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:15:45,881] {jobs.py:726} INFO - Starting 2 scheduler jobs
> {noformat}
> h2. web
> {noformat}
> [2016-09-19 18:10:12,213] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,597] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,663] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:12,790] {__init__.py:36} INFO - Using executor 
> CeleryExecutor
> [2016-09-19 18:10:13,210] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,742] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,815] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> [2016-09-19 18:10:13,912] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags
> {noformat}
> h2. worker
> {noformat}
> Logging into: 
> /usr/local/airflow/logs/repos_to_process_dag/update_repos_to_process/2016-09-19T18:15:35.200082
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)